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LETTER FROM 


fest Wee mae edi tor 


Full STEAM Ahead 


Dear School Science and Mathematics Journal Readers: 

I reiterate what I wrote in my first “Letter from the 
Editor” for School Science and Mathematics (SSM) 
(November, 2011): As per SSM’s mission, I vow to 
“promote research, scholarship, and practice that improves 
SSM and advances the integration of science and math- 
ematics.” In that editorial, I advocated for the addition of 
an “A” to Science, Technology, Engineering, and Math- 
ematics (STEM): Arts. My vision for the “A” reflected fine 
arts, performing arts, and music. Two years later, I have 
become more “liberal” in this vision for integration. If you 
indulge in corny catchphrases, you could say I am moving 
full “STEAM” ahead. 

In fact, I recently received an e-mail (July 1, 2013) from 
Dr. Santa Ono, President of the University of Cincinnati 
(UC), which both underscores and broadens my current 
appeal for the “A”: “I applaud the American Academy of 
Arts & Sciences—as well as The Andrew W. Mellon Foun- 
dation and the Carnegie Corporation which supported the 
Commission on the Humanities and Social Sciences—for 
their recent report on how we can reinvigorate support for 
liberal arts education,” Dr. Ono noted. “These disciplines 
remain at the core of.our ability to discover, create, col- 
laborate, imagine, innovate, inspire, analyze and reflect. 
We cannot allow ourselves or others to promote STEM at 
the expense of other disciplines, especially the humanities, 
arts and social sciences. Simply stated, higher education 
must find the means and the mechanisms to further invest 
in both areas, as each makes the other decidedly better” 
(italics added for emphasis). 

In a similar appeal (e-mail, July 1, 2013), Dr. Richard 
Miller, Chair of the University Faculty and Professor of 
Advanced Structures at UC, wrote about the value of a 
liberal arts education rather than an education that is 
simply seen as “directly impacting the economy in terms 
of making students either employable or in supporting the 
development of new products and processes.” These goals 
are certainly at the heart of many calls for STEM educa- 
tion. Professor Emeritus of English at Clemson University, 
Hallman Bell Bryant, made a comparable assertion when 
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he called for students to consider the rewards of education 
that are not necessarily “material ones ... A course of 
study that fulfills a student’s intellectual and spiritual 
needs might well outweigh what sort of paycheck one 
earns” (The News Herald, December 6, 2012). 

As a parent, I identify with other parents who want their 
children to graduate from college and gain employment. 
As acitizen of the United States and the world, I also know 
that universities play vital roles in helping to grow the 
economy. Most importantly, I serve the students who I am 
privileged to teach and mentor. My role as such is to stay 
true to my own philosophy of education while at the same 
time honoring the concerns of all of these stakeholders. 

Harvard’s Office of Admissions (Harvard Admissions, 
2009) defines a liberal arts education as one “conducted in 
a spirit of free inquiry undertaken without concern for 
topical relevance or vocational utility. This kind of learn- 
ing is not only one of the enrichments of existence; it is 
one of the achievements of civilization. Jt heightens stu- 
dents’ awareness of the human and natural worlds they 
inhabit. It makes them more reflective about their beliefs 
and choices, more self-conscious and critical of their pre- 
suppositions and motivations, more creative in their 
problem-solving, more perceptive of the world around 
them, and more able to inform themselves about the issues 
that arise in their lives, personally, professionally, and 
socially” (emphasis added). I endorse these worthy aspi- 
rations of a liberal arts education and argue that mathemat- 
ics, science, and STEM education have the potential to 
achieve these same goals. 

I espouse a view of mathematics as a human endeavor. 
Borasi (1991) proposed four pedagogical assumptions that 
have the potential to reform how school mathematics is 
taught: (a) mathematics as a humanistic discipline in 
which results are not absolute and immutable but are 
socially constructed and fallible; (b) knowledge as a 
dynamic process of inquiry, characterized by uncertainty 
and conflict, which leads to a continuous search for a more 
refined understanding of the world; (c) learning as a gen- 
erative process of meaning making, enhanced by social 
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interactions; and (d) teaching as providing support for 
students as they search for their own understanding and as 
organizing the classroom as a community of learners 
engaged in creating mathematical knowledge. 

Obviously, mathematics is built upon pattern and logic. 
Just as undeniably, however, true mathematicians use cre- 
ativity. Unfortunately, that might be hard for nonmathe- 
maticians to accept (Are there nonmathematicians? That 
might be another editorial!), but it is true. The beauty of 
the logic and pattern is accentuated, even augmented, by 
the beauty of the creativity to apply it to problem solving. 

Unfortunately, school mathematics is all too often 
taught as procedures and rules to be memorized. To Dr. 
Ono’s chagrin, students are not encouraged to “discover, 
create, collaborate, imagine, innovate, inspire, analyze and 
reflect” upon the mathematics that they experience. If 
mathematics was taught using Borasi’s (1991) four peda- 
gogical assumptions, then Dr. Ono’s call to synthesize 
STEM and liberal arts education could become a reality. 

I believe our students need to understand and appreciate 
the connections between the disciplines—science, tech- 
nology, engineering, mathematics, and liberal arts. We, as 
teachers and teacher educators, can help in this endeavor. 

Thank you for your dedication to our SSM organization 
and the students that will be impacted by your efforts to 
endorse and conduct research, scholarship, and practice 
that improves SSM and advances the integration of 
science, mathematics, and liberal arts. When we disregard 
the silos created by the structures in higher education, 
STEM and liberal arts will each make the other decidedly 
better. Each can teach students to “discover, create, col- 
laborate, imagine, innovate, inspire, analyze and reflect.” 

Regards, 

Shelly Sheats Harkness, PhD 
SSM Jourrtal Co-Editor, University of Cincinnati 
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Scaffolding is a complicated construct that can take many forms, including both written and verbal forms. This 
research study focused on three elementary science classrooms where students were using a series of written scaffolds 
to guide explanation building. In each classroom, data were collected to document and study an additional type of 
scaffold, verbal scaffolds that the teachers provided to complement the written scaffolds. Findings suggested that some 
types of verbal scaffolds, such as navigational guidance, were universal and therefore cut across all three grade levels. 
On balance, other verbal scaffolds were more common with younger students in association with their first explanation- 
building science unit, such as a verbal scaffold that turned an open-ended question into a few multiple-choice options. 
Through the characterization of the types and range of verbal scaffolds that teachers say, both in general and in response 
to audience, we can gain insights to inform both curricular design and professional development toward supported 


explanation building across target audience, time, and topic. 


A Framework for K-12 Science Education (National 
Research Council [NRC], 2012), the important document 
serving as the foundation for the Next Generation Science 
Standards (NGSS), outlines a new emphasis in science 
education on a smaller set of core content ideas and 
science practices for all students, including students in 
grades K—6. The document states, “building progressively 
more sophisticated explanations of natural phenomena is 
central throughout K—5, as opposed to focusing only on 
descriptions in the early grades and leaving explanation to 
the later grades (NRC, 2012; p. 2-25). Research in science 
education consistently demonstrates that as early as the 
onset of formal elementary-age schooling, American stu- 
dents are capable of sophisticated scientific reasoning such 
as constructing explanations about focal science content 
(Metz, 2008; NRC, 2007). Several research groups have 
designed curricular units that focused on guiding students 
to construct scientific explanations as a means to promote 
deep conceptual understandings of focal science content. 
For example, Linn, Shear, Bell, and Slotta (1999) guided 
seventh and eighth grade students’ explanation building 
about the causes for the onset of deformities in frogs as a 
means to deepen conceptual understanding of selected 
concepts in genetics, biology, and chemistry. On balance, 
nearly all of the recent studies that focused on guiding 
explanation building selected a target audience of students 
in the secondary years of schooling (grades 7—12 in the 
United States). Therefore, while these studies have pro- 
vided a foundation for how curriculum developers and 


School Science and Mathematics 


researchers might guide middle and high school students 
in explanation-building activities, (e.g., Chin & Osborne, 
2010; McNeill & Krajcik, 2008), they do not provide 
specific guidance for how to support elementary students’ 
explanation building. This article addressed this gap in the 
research through a research study designed to examine the 
range and types of verbal supports teachers provide to 
guide elementary school students in explanation building 
around concepts in biodiversity and ecology. 


Conceptual Framework 
Extending Prior Work With Explanations 

Our work builds from several research studies that have 
explored different approaches to guiding students’ written 
development of evidence-based explanations. These 
studies draw on the work by Brown and Campione (1990) 
and others who argue that guiding students to write expla- 
nations leads to deeper conceptual understandings of 
science concepts because it challenges students to evalu- 
ate, integrate, and elaborate on their knowledge in impor- 
tant ways (e.g., McNeill, Lizotte, Krajcik, & Marx, 2006). 
Chin and Osborne (2010) provided empirical evidence for 
the use of question prompts, contrasting views, argument 
diagrams, and evidence statements in guiding secondary 
students toward more productive argumentation. 

The notion of scaffolding, with regard to teaching and 
learning, was first introduced in the context of tutoring. It 
described the “. . . process that enables a child or novice to 
solve a problem, carry out a task or achieve a goal which 
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would be beyond his unassisted efforts” (Wood, Bruner, & 
Ross, 1976, p. 90). Over time, scaffolds can be minimized 
or removed so that the student are encouraged to genera- 
tion more aspects of the work on their own. Later, this idea 
of one-on-one scaffolding was connected to Vygotsky’s 
(1978) sociocultural perspective on learning and develop- 
ment, and it was expanded to include the idea of a teacher 
using a range of tools, such as language (both written and 
verbal) to scaffold the learning of a whole class (Palinscar 
& Brown, 1984; Stone, 1998). 

Several science education studies focus on the types of 
online and written scaffolds that are particularly useful for 
guiding scientific inquiry and complex reasoning (Krajcik, 
Blumenfeld, Marx, & Soloway, 2000; Quintana et al., 
2004; Reiser, 2004). These authors describe scaffolds that 
can reduce the complexity of the task by, for example, (a) 
sequencing the task for easier management, (b) providing 
componential guidance such as boxes for the different 
parts of a scientific explanation, (c) providing navigational 
guidance to help students monitor their progress, and (d) 
providing content-specific hints and prompts to guide stu- 
dents to distinguish salient from irrelevant variables. 

Despite a set of studies by different research teams 
focused on written scaffolds and explanation building, we 
recognize that we would be naive to assume that the 
written scaffolds are the only important form of scaffold- 
ing, or even the most important guidance, that might be 
associated with strong learning outcomes on posttests. We 
propose that studies, even our own, that study only written 
scaffolds are inevitably over simplifying the set of cogni- 
tive supports that are assisting students in constructing 
scientific explanations. This realization led us to our 
current work to review literature that discusses teacher talk 
and teacher verbal supports that provide guidance to 
students (Scott, 1998). 

Teachers’ Talk and Verbal Supports to Guide 
Science Inquiry 

In this article, we define and study verbal scaffolds as 
the range of ways teachers verbally guide the construction 
of evidence-based explanations, both in addition to and 
along with provided written scaffolds in the curricular 
materials. Reform-based approaches to science inquiry 
emphasize the importance of the role of teachers as facili- 
tators who “... orchestrate discourse among students 
about scientific ideas,” (NRC, 2000, p. 22). A reasonable 
body of literature exists that explores the ways teachers 
can help to make scientific inquiry, including explanation 
building, accessible to students. van Zee and Minstrell 
(1997) examine the types of questions and responses 
teachers use to prompt high school students to think 
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deeply about their own and their peers’ responses. In this 
work, van Zee and Minstrell (1997) study the effectiveness 
of guides that help students compare approaches or vali- 
date a consensus with evidence. McNeill and Krajcik 
(2008) studied instructional practices that teachers used to 
first introduce scientific explanation to middle school stu- 
dents, such as “making the rationale of scientific explana- 
tion explicit” or “defining scientific explanation” and its 
components. In a subsequent paper, these authors discuss 
the synergistic use of written scaffolds and pedagogical 
practices to guide fruitful explanation building among 
middle-school students (McNeill & Krajcik, 2009). These 
researchers suggest that teachers’ verbal supports may 
come in many different forms including: (a) sequencing 
tasks for easier student management, (b) using questions 
to breakdown difficult tasks into smaller pieces (similar to 
componential guidance), (c) highlighting key ideas about a 
concept, or (d) making connections to students everyday 
lives by providing examples and analogies (Krajcik et al., 
2000). 

Herrenkohl and Guerra (1998) utilized student roles as a 
means of peer support for the generation of predictions 
and explanations. In their studies, students were assigned 
different roles including intellectual roles or “thinking 
practices” and audience roles. Intellectual roles provided 
students with metacognitive guidance about what to share 
and explain during discussions. Audience roles offered 
a complementary purpose through the use of student 
prompting through reflective questions such as, “What is 
your prediction? What did you find out? Did your results 
support your theory?” (p. 448). Other studies similarly 
utilize teacher, peer, or self-prompted metacognition to 
engage in and encouraging meta-talk through the 
explanation-building process (e.g., Herrenkohl, Tasker, & 
White, 2011; Leinhardt & Steele, 2005). Collectively, 
these studies suggest that verbal scaffolds emphasizing 
either cognitive or metacognitive support and that are 
presented in conjunction with written curricular scaf- 
folds may provide valuable reflection and support for 
elementary-age student participation in the procedural and 
cognitive aspects (Fleer, 1992) of scientific discourse and 
practices, such as generating scientific explanations. 

Building on this literature base, we designed our study 
to characterize the range of verbal scaffolds teachers spon- 
taneously provided in a set of elementary classrooms 
where students were working with an eight-week, inquiry- 
based curricula emphasizing written scaffolds for expla- 
nation building. We were interested in collecting data from 
a purposeful sampling of elementary classrooms to 
provide an empirical base as to how, if or when elementary 
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teachers introduce and guide complex ideas like “making 
a claim” and “providing supporting evidence” with stu- 
dents who are just embarking on explanation construction 
activities for the first time. In addition, we were interested 
in gathering data to determine whether or how teachers 
might differentially utilize verbal scaffolds for younger 
late-elementary students (e.g., fourth graders) as com- 
pared to slightly older students (e.g., sixth graders). 


Research Questions 

Because there are some suggestions but no agreed-upon 
measures of verbal scaffolds documented in the literature 
and virtually no discussion of verbal scaffolds for elemen- 
tary science, we adopted an exploratory study approach 
“to identify or discover important categories of meaning 
and to generate hypotheses for further research” (Marshall 
& Rossman, 1999, p. 33). We adopted a constant compara- 
tive analysis research approach (Glaser, 1965; Patton, 
2002) in order to engage in cycles of coding and analysis. 
A constant comparative analysis approach permitted the 
systematic identification and characterization of the range 
of verbal scaffolds our sample teachers were using along 
with the written scaffolds in the curriculum. For example, 
a first pass through the teacher transcript data would reveal 
a particular type of verbal scaffold. When we observed a 
similar meaning unit in other transcripts, we revisited and 
discussed both transcript cases and the corresponding 
codes in light of each other in order to determine whether 
codes needed to be combined or new codes were needed to 
capture both similar and different characterization of their 
verbal supports. These cycles of coding and analysis con- 
tinued until coders reached consensus. In this manner, the 
constant comparative approach supported our ability to 
gather details of both the kinds and frequency of verbal 
scaffolds used by our sample of fourth, fifth, and sixth- 
grade teachers. The research questions explored were: 

1. What types of verbal scaffolds did teachers provide 
when guiding late elementary students to develop scien- 
tific explanations about focal science content? 

2. In what ways did teachers’ verbal scaffolds differen- 
tially complement written scaffolds presented in the 
inquiry-focused curricular programs? 


Method 
A major goal of The Center for Essential Science is to 
develop and evaluate replacement curricular units focused 
on guiding fourth through 10th graders in explanation 
building, prediction building, data collection, and analysis 
of focal science concepts in the life and environmental 
sciences. Our work is anchored in a learning progression 
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framework, which focuses on the development and evalu- 
ation of sequential learning goals in science or mathemat- 
ics that (a) builds on what we know about how children 
learn, (b) spans multiple years and age bands, and 
(c) focuses on the development of complex thinking about 
a small number of focal topics (NRC, 2007). In our work, 
our learning progression serves as a template for the 
sequence and definition of learning goals across multiple, 
consecutive curricular units, assessments, and profes- 
sional development modules. Prior to this study, our 
research team developed a learning progression with two 
dimensions: a content dimension encompassing concepts 
in biodiversity and ecology across fourth, fifth, and sixth 
grades and a science practices dimension encompassing 
the creation of scientific explanations. 

The curricular units consist of eight weeks of activities 
that ground students’ learning in data collection about 
local organisms, emphasize repeated exposures to ecology 
and biodiversity content, and provide repeated practice 
with written guidance to the development of scientific 
explanations. As the curricular units were enacted within 
several classrooms within a large, urban district allowed us 
to observe how the curriculum lent itself to modification to 
fit the needs of students with a diversity of learning needs 
and backgrounds. For example, we saw the curriculum 
translated into Arabic and Spanish to support students’ 
with different language and cultural backgrounds in 
guided knowledge building and explanation construction. 

One of the most challenging aspects of our work was the 
development of a set of written scaffolds that were embed- 
ded in our curricular units and designed to guide fourth, 
fifth, and sixth graders’ first construction of evidence- 
based explanations about focal content. In this work, we 
adopted a definition for scientific explanations that is 
modified from the definition of argumentation in Toulmin 
(2006), that was used in our earlier work (e.g., Songer, 
Kelcey, & Gotwals, 2009), and that is similar to the model 
used by some other researchers (e.g., McNeill et al., 
2006). In our definition, a scientific explanation contained 
three components, each of which are defined as follows: 

1. Aclaim. A claim is a complete sentence that answers 
the scientific question, 

2. Two pieces of evidence. Evidence are data that 
support the claim and that address the scientific question, 
and 

3. Reasoning. Reasoning is a scientific concept or defi- 
nition that links the claim to the evidence and that supports 
the scientific question. 

Each student completed eight weeks of scaffold- 
rich explanation construction activities. The student 
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workbooks for each grade contained several opportunities 
to practice explanation building as follows: seven scaffold 
explanations in fourth grade, nine scaffold explanations in 
fifth grade, and 10 scaffold explanations in sixth grade 
(Songer, 2006; Songer et al., 2009). The written explana- 
tion hints and prompts were developed and revised 
through several rounds of empirical evaluation, including 
a set of studies that evaluated the effectiveness of consis- 
tent support or faded support over the eight-week curricu- 
lar unit (Lee, 2003). Previous research results evaluating 
students’ abilities to develop explanations under various 
guided conditions were strong. In one recent study focused 
on written scaffolds for explanation-building with sixth 
graders, students demonstrated significant student 
achievement gains on both multiple choice and open- 
ended explanation-building tasks, with achievement gains 
particularly strong on the explanation-building tasks in 
associated with a variety of content foci in the life sciences 
(Songer et al., 2009). For more information on the learn- 
ing progression and the achievement results, see Songer 
et al. (2009) or Songer and Gotwals (2012). 

Within the student notebooks, the written scaffolds 
came in two forms: (a) explanation-construction scaffolds 
and (b) content scaffolds. Explanation-construction scaf- 
folds consisted of response boxes and sentences that 
defined the component of the explanation, such as the 
definition of evidence. Content scaffolds guided students 
to the particular location (e.g., a data table) or resource 
that was the source of the data serving as claim, evidence 
or reasoning. Figure | presents a sample fifth grade data 
sheet that contains both explanation-construction (larger 
boxes) and content (gray boxes) scaffolds. In all activi- 
ties, the scientific question at the top of the page was 
matched to the content standards required for our target 
audience. % 

Sample 

The sample was 161 students in grades four through six 
and their six teachers in a large urban public school district 
within the midwestern United States. This district provides 
education for over 90,000 students in grades PK to 12. 
This urban district had a large minority student population 
with approximately 88% African American students. 
Almost 78% of the student population was economically 
disadvantaged according to the district profile (District 
Website, 2010). Two teachers from each grade level were 
selected using criterion sampling to be participants in this 
study (Patton, 2002). Teacher selection criteria included: 
(a) the teacher had at least one year of experience teaching 
this curricular unit in the past, (b) the teacher was actively 
completing the fourth, fifth, or sixth grade curricular unit 
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., Data Sheet 37 a 


Scientific Question: ~ 
Which zone in the schoolyard has the highest biodiversity? 











Make a CLAIM: 

Write a complete sentence that answers the scientific question. Hint: 
Think About your 
abundance and 
richness graphs 
carefully. 





aa Ne ne 
Write the scientific concept or definition that you thought about to make 
your claim. a 


Hint: 
Think about 
how hiodiversity 
is related to 
abundance and 
richness. 





Give your EVIDENCE: 
Look at your data and find two pieces of evidence that help answer the 
scientific question. 





Think about 
which zone 
has the highest 
abundance and 
richness. 








LESSON 1 


Figure 1. Sample student sheet from the fifth grade unit illustrating explana- 
tion construction (big rectangles) and content scaffolds (gray boxes). 


with her students leading up to the lessons we wished to 
study, and (c) the teacher was using the curricular unit as 
her primary resource to teach instead of heavily modifying 
or using outside resources often to teach related topics. In 
each classroom, the individual teachers made all of their 
own decisions about how and when to teach the district- 
mandated science content. In other words, even though the 
six teachers were all following the same curricula, the 
research team granted complete autonomy to teachers for 
all pedagogical decisions. 

One teacher, Teacher B, taught both fourth and fifth 
grades, resulting in six classes of students taught by five 
different teachers. Video records were collected of each 
teacher guiding students to construct a scientific explana- 
tion in response to the same scientific question in each 
grade. Grade levels, teachers, number of students per 
grade, scientific questions and parts of the scientific expla- 
nation addressed in this lesson are available in Table 1. 

Each explanation building data sheet took approximately 
150-minute class period to complete. Researchers observed 
and taped the teachers’ discussions and lessons of the day of 
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Table 1 
Simple Descriptors for Video Records 
Grade Teachers n Scientific Question and Activity in Observed Lesson Parts of Explanation 
Fourth grade AandB 52 _ Data Sheet 14: “Js the animal you collected an insect?” Claim 
Students use their observations of an animal they collected from Evidence 
their schoolyard to determine whether or not their animal is an 
insect. 
Fifth grade BandC 46 Data Sheet 32; “Which zone in the schoolyard has the highest Claim 
biodiversity?” Evidence 
Students use the data they collected in their schoolyard to determine Reasoning 
which zone has the highest biodiversity) 
Data Sheet 37: “Which Michigan habitat has the highest 
biodiversity?” 
Students use provided data in the form of bar graphs to determine 
which Michigan habitat has the highest biodiversity [Figure 1] 
Sixth grade DandE 63 Data Sheet 15: “What will happen to the number of johnny darters in Claim 
the river if the water becomes very muddy and dark?” Evidence 
Students look at their ecosystem diagram to determine what biotic and Reasoning 


abiotic factors would be impacted by a muddy river. 


completion of the explanation building data sheet, as well 
as the days preceding and following the day she was guiding 
the students through the data sheet we were studying. 
Depending on the teacher, some enactments took more or 
less than the 150-minute class period. Our research team 
also collected samples of student notebooks so student 
responses could be reflected upon in follow-up interviews 
with the teachers. Video cameras were focused on capturing 
the teachers’ actions and verbal talk throughout the day’s 
activities. The camera was placed at the back of the class- 
room as the teacher addressed the full class. When the 
teacher traveled to small groups, the camera traveled with 
the teacher to capture the teacher’s verbal interactions with 
the smaller groups of students. 
Data Analysis 

Coding steps. Video records of each teaching enact- 
ment were transcribed and analyzed by members of the 
research team. The constant comparative method (Glaser, 
1965) was utilized to code and analysis all transcript data. 
This process consisted of three steps: 

1. Identifying and comparing incidences applicable to 
each starting category (e.g., start codes), 

2. Implementing cycles of iterative coding toward inte- 
grating categories and their properties, and 

3. Organizing final list of codes and examples. 
Since the researchers were familiar with the curriculum 
and how it was enacted, an initial “start-list” of codes 
describing possible verbal scaffolds were identified and 
revised through several iterations of coding (Miles & 
Huberman, 1994, p. 58). The start-list included codes 
about reading particular hints directly from the written 
worksheets and codes about using analogies. The tran- 
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scripts were divided into “meaning units,” at the sentence 
and paragraph level, and codes were applied to these units 
to describe the different verbal scaffolds the teacher was 
using to guide students’ construction of scientific expla- 
nations. For example, a meaning unit could be the follow- 
ing segment of transcript: 


Teacher E: Shalen, read the scientific question. 

S: What would happen to the number of 

johnny darters if the water in the river 
becomes very muddy and dark? 
What happens to the johnny darters if the 
water in the river becomes very muddy 
and dark? (Teacher E, classroom video 
1:03, 12/17/2009) 


Teacher E: 


This segment could be coded as “orienting to the scientific 
question.” The next teacher comment made up another 
meaning unit that was coded as “orienting to the hint” for 
claims: 


Teacher E: Look at the hint. Look at the hint. Trey, 
read the hint for me. The gray box on the 
side. (Teacher E, classroom video 1:30, 
12/17/2009) 


In the second step, codes evolved after each iteration of 
review of the transcripts and having the raters discuss the 
emerging codes and codebook. For example, our start-list 
code for “reading the hint” was divided into two subcat- 
egories: (a) the teacher read the hint verbatim from the 
worksheet, or (b) the teacher oriented students toward the 
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Table 2 \ 
Types and Frequencies of Verbal Scaffolds by Grade 
Scaffold Type Example 4th Sth 6th 
1. Orienting to hint —‘ There is a hint right here, “think about your animal and if it has the same - / Xx 
characteristics as an insect,’ you need to answer that question. (Ms. A) 
2. Clarifying terms Before students were able to discuss the interactions among different components / / Xx 
of an ecosystem, Teacher E prompted them to clarify their understanding of the 
term ecosystem: 
“What is an ecosystem? Let’s go back. What’s an ecosystem, Tia?” (Ms. E) 
3. Writing format Students were writing short phrases for the parts of their scientific explanation and x x / 
Teacher B used writing format scaffolds to remind them to provide complete 
sentences: 
“Tt still has to be a sentence, you can’t just write, no sections, no body sections, 
no head. It still has to be a sentence.” (Ms. B) : 
4. Formatting Is that kind of like a general idea? You guys gave me real specific things, you / / - 
sentence content went way past general. You named organisms, you name how they would be 
affected. 
Bring it back down to something real general. When this happens, that happens. 
When this changes, it affects that. (Ms. E) 
5. Directing to The teacher guided students to provide evidence that muddy water would block x x xX 
necessary content sunlight and affect organisms that need sunlight to survive in an ecosystem. She 
guided students to think about the needs of plants so they could make this 
connection: “So, what would ... okay, Charles .. . If it’s a plant, what do they 
need to grow? What do plants need to grow?” (Ms. D) 
6. Answer options “Just answer the question “yes what I collected is an insect’ or ‘no what I / - - 


collected is not an insect’ and stop” (Ms. A) 


X = 10-20 instances; / = four to nine instances; - = fewer than four instances. 


hint by reminding them where they could read about it. In 
the third step, codes were grouped into categories and 
subcategories that shared similar meanings, resulting in 
pattern codes that implied themes across the data (Miles & 
Huberman, 1994). Efforts were made to find both con- 
firming and disconfirming evidence for emerging themes. 

Addressing reliability. During the coding process, two 
of the authors coded two transcripts to initially revise the 
start-list of codes. Check coding was completed through a 
process of discussion among coders when differences 
occurred, and subsequent updates to the codebook through 
addition, deletion, combination, rename, or redefinition 
of codes. After a resulting codebook was realized, both 
researchers coded the remaining four transcripts sepa- 
rately. After full coding, the coding practices and results of 
both coders were compared again, and differences were 
discussed until agreement and clarification of the 
codebook was reached. The researchers recoded the entire 
data set together with the final codebook to ensure that 
they were applying the codes appropriately and consis- 
tently (Miles & Huberman, 1994, p. 64). 


Results and Discussion 
Range and Frequency of Verbal Scaffolds 
In response to our first research question “what types of 
verbal scaffolds did teachers provide when guiding late 
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elementary students to develop scientific explanations 
about focal science content” our qualitative analysis 
revealed six conceptual categories of verbal scaffolds 
demonstrated by our teachers. Table 2 presents each of the 
six conceptual categories, a representative example of 
each category, and general information about the fre- 
quency of each type detected in our fourth, fifth, and sixth 
grade observations. 

Orienting to hint. The first type, orienting to hint, 
involved the teacher helping students to focus on the 
various content scaffolds (e.g., written hint boxes) on their 
data sheets. Content hint boxes began with the phrase 
“think about ...” to help draw student attention to the 
science concepts they should draw on to write their claim, 
evidence or reasoning while constructing their scientific 
explanation. Teachers in our study oriented students to the 
content hint boxes in different ways, such as having a 
student read the hints out loud to the class or using the 
hints themselves to turn into questions for the students. For 
example, one teacher said, “There is a hint right here, 
‘think about your animal and if it has the same character- 
istics as an insect,’ you need to answer that question” 
(Teacher A, classroom video 10:35, 5/24/2010). 

Another teacher simply directed students to read the 
hint: “Look at the hint. Look at the hint. Trey, read the hint 
for me. The gray box on the side” (Teacher E, classroom 
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video 1:30, 12/17/2009). And yet another teacher used the 
hint as a jumping-off point for a series of questions that 
she felt would guide the students: 


T: Look at the hint. Read the hint for me, Tia. 

S: Think about how the biotic and abiotic compo- 
nents interact. 

T: Okay, let’s make sure we understand. What is 
biotic? What does that mean? (Teacher D, class- 
room video 8:42, 2/1/2010) 


Clarifying terms. Clarifying terms referred to a scaf- 
fold where the teacher paused the discussion from moving 
forward until she had helped the students review their 
understanding of terms that were necessary for them to 
either make sense of the written scaffolds or to come up 
with other appropriate parts of a scientific explanations. 
For example, Teacher B anticipated the importance of stu- 
dents understanding the terms abundance and richness as 
essential pieces of the larger concept of biodiversity prior 
to analyzing graphs to determine which Michigan habitat 
had the highest biodiversity: 


Teacher B: Okay, what’s biodiversity? 
S: (hard to hear .. .) 

Teacher B: It’s the combination of both, right? So if 
that’s the combination of both, and wet- 
lands had a combination of both, wouldn’t 
that be our answer, our reason why we 
picked that? What does biodiversity really 
mean? Anybody else need any help? 
Do you know what biodiversity is? 
What’s biodiversity? The total (s/he 
points at the graphs and word) ... 
abundance, AND, richness, so_ that’s 
biodiversity. 
So, which one had the highest of abun- 
dance (points to graph), “wetlands” and 
which one had the highest richness, 
“wetlands.” 
Wetlands. So biodiversity is what, total 
(points) 
S: Abundance and richness. (Teacher B, 

classroom video 17.32, 6/1/2010) 


Teacher B: 


Writing format. The third type of verbal scaffold, 
writing format, referred to the verbal support teachers 
provided to guide students on how to structure the written 
responses to each part of the explanation. For example, 
Teacher B said, “it still has to be a sentence, you can’t just 
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write, no sections, no body sections, no head. It still has to 
be a sentence” when guiding her students to make a sci- 
entific claim that was a complete sentence (Teacher B, 
classroom video 22.17, 6/1/2010). This category of verbal 
scaffolds was not necessarily defining the parts of an 
explanation. Rather, it was directing students to complete 
responses that were customary norms of the curricular 
activities. 

Formatting sentence content. In contrast, formatting 
sentence content were specific verbal scaffolds that helped 
simplify more difficult open-ended instructions. In this 
type of verbal scaffold, teachers simplified a difficult 
written scaffold by reworking it into a more structured 
verbal scaffold. Examples included cases where teachers 
provided students with the start of the sentence the stu- 
dents were to complete or teachers translated an open- 
ended instruction into a fill-in-the blank statement to guide 
their students’ development of that part of the explanation. 
In one example, Teacher E used sentence starters to guide 
students into transforming more specific responses into 
appropriately general statements associated with the rea- 
soning component of a scientific explanation: 


Teacher E: Is that kind of like a general idea? You guys 
gave me real specific things, you went way past 
general. You named organisms, you name how they 
would be affected. Bring it back down to something 
real general. 


When this happens, that happens... 


When this changes, it affects that ... (Teacher E, 
classroom video 19:01, 12/17/2009) 


Similarly, Teacher C provided a case of. formatting sen- 
tence content scaffolds to help her students focus their 
answers through a fill-in-the blank format as follows: 


Teacher C: In biodiversity you have to think about 
richness and abundance together, right, and so you 
have to give me an answer, well, which zone has 
highest biodiversity. So I think that is fine, but you 
need to tell me the rest of it ... I think that zone 
blankitty-blank, whatever you think, and where are 
you going to look for it? (Teacher C, classroom video 
22:11, 5/10/2010) 


Directing to necessary content. The fifth category, 
directing to necessary content, involved specific kinds of 
direction by teachers so that students could more easily 
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determine which aspects of the content information or data 
was most important for their explanation building. In one 
example of this category, Teacher D was leading a class 
discussion to guide students to design an explanation to 
address the question, what would happen to the johnny 
darters if the river got muddy? In her verbal discussion, 
Teacher D directed students to focus on the conditions that 
would develop if the river was muddy. To achieve this goal, 
Teacher D initiated a series of questions that directed stu- 
dents to the necessary content. 

Answer options. The sixth category, answer options, 
were the instances when the teacher provided two or more 
possible answer choices for students, and then prompted 
students to choose one to use for their response. This was 
a high scaffold category, as we observed this guidance as 
acting much like structuring an open-ended response into 
a multiple-choice question. This scaffold type occurred 
when students were confused about what to provide rela- 
tive to a very specific written scaffold, such as a written 
scaffold related to the development to a claim to address 
the question, is the animal you collected an insect? Teacher 
A addressed the confusion among her students by provid- 
ing an answer options verbal scaffold that narrowed the 
range of choices of possible claims: 


Teacher A: You’re going to answer the question, yes 
it is or no it’s not. Don’t tell me because, why or 
anything else like that at this point. Just answer the 
question “‘yes what I collected is an insect” or “no 
what I collected is not an insect” and stop. (Teacher A, 
classroom video 10:27, 5/24/2010) 


Verbal Scaffolds to Complement Written Scaffolds 

Our second research question focused on the ways in 
which teachers’ “verbal scaffolds differentially comple- 
ment the written scaffolds. In other words, we were inter- 
ested in analysis to determine patterns in the type or 
frequency of verbal scaffolds used by teachers in the dif- 
ferent grade levels. This analysis was conducted in order to 
further characterize the range and amount of customi- 
zation of verbal scaffolds relative to their particular target 
audience, context, and written stimulus materials. As our 
data represent only a small sample of the total verbal 
scaffolds used by our teachers in association with the 
complete curricular unit, we do not intend to over gener- 
alize our results beyond the instances and teachers in this 
sample. Nevertheless, Table 2 and the subsequent discus- 
sion suggested some potentially interesting trends. 

One of our first observations was that certain types of 
verbal scaffolds tended to be used similarly across grade 
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levels, while others tended to be more frequently observed 
at either younger or older grades in our sample. One type 
of verbal scaffold, directing to necessary content, was 
observed in similar amounts by.all of our teachers. As 
students needed to draw on data and information they may 
have collected days before working on their data sheets, 
directing to necessary content became a helpful role to 
guide student thinking. Even if students had worked with 
similar material in earligr construction of explanations, 
our results suggest that directing to necessary content was 
important. For example, some of the sixth grade written 
worksheets asked students to refer to multiple data sources 
in the gathering of evidence for their explanations, and 
teacher verbal scaffolds were observed as a guide to the 
correct previous resources. 

Our data suggest that other types of verbal scaffolds 
were more commonly observed in association with the 
sixth grade curricular unit when compared to fourth and 
fifth grade observations. Both orienting to hint and clari- 
fying terms were slightly more common with sixth grade 
teachers as compared to fourth or fifth grade teachers. We 
speculate that these slight differences can be explained, in 
part, because of the more complex scientific vocabulary in 
the sixth grade unit as compared to the fourth or fifth grade 
vocabulary. For example, sixth graders were expected to 
construct explanations by drawing on data and definitions 
associated with the terms biodiversity, abiotic, biotic, 
abundance, and richness. Fourth and fifth graders’ expla- 
nation building, however, tended to focus on a fewer 
number of terms overall and terms with greater real world 
understandings, such as insect, body part, or legs. 

A third set of verbal scaffolds demonstrated slightly 
higher frequencies by teachers of younger students. Verbal 
scaffold, writing format, was observed more commonly 
among fourth and fifth grade teachers than sixth graders. 
Our explanation of this outcome is that sixth graders likely 
needed fewer prompts about the use of complete sentences 
or the need for two pieces of evidence to support their 
claims both because their reading level was higher and 
because of their general familiarity with literacy norms 
and the explanation worksheet format. Similarly, there 
were more instances of verbal scaffold, answer options, 
and formatting sentence content, with younger students. 
Answer options and formatting sentence content were 
verbal scaffolds that simplified an open ended activity 
through either transformation into multiple choice-type 
options or providing or a sentence starter. These verbal 
scaffolds provided additional structure and guidance for 
the younger students’ early attempts at explanation build- 
ing. We speculate that teachers of these younger students 
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Table 3 
Examples and Frequency of Alternative Definitions of Evidence by Grade 
D (6th) —E (6th) B (Sth) C (Sth) A (4th) B (4th) 
Prove it. Give me your proof. Proof. Without it [evidence], you 
could trick me. 
How do I Your evidence may not be _— Prove it to me. Prove it to you. 
know that? the same as your 
neighbor’s. 
Is it true or not? Take the data that goes with 
what we’re looking for. 
Make certain the 
evidence is valid. 
Total 0 4 6 





recognized that some of the younger students were not 
ready to complete the explanation components on their 
own, and therefore required structured options to guide 
them into the correct answers. 

Verbal Scaffolds That Provide Multiple Working 
Definitions of Abstract Terms 

As mentioned earlier and illustrated in Figure 1, one of 
the primary design features of our curricular units was the 
written activity sheets that contained two types of written 
scaffolds: explanation-construction scaffolds and content 
scaffolds. In designing these scaffolds, we intentionally 
kept the wording and format of the explanation- 
construction scaffolds identical throughout the units in 
order to provide consistent support in the definition of 
claim, evidence, and reasoning within and across units. 
For example, all data sheets used the same instructions to 
guide students in generating the reasoning portion of a 
scientific explanation as follows: “Write the scientific 
concept or definition that you thought about to make your 
claim.” In contrast, the written content scaffolds were 
unique to each scaffold explanation worksheet, as the 
content scaffolds needed to support the generation of a 
claim, evidence, or reasoning to match the scientific ques- 
tion and the specific topic area. For example, the content 
scaffold from Figure 1 reads, “Hint: Think about which 
zone has the highest abundance and richness.” 

Our data and analysis revealed that while all teachers 
used the research project definitions of claim, evidence 
and reasoning, our teachers supplemented these project 
definitions with their own alternative phrasings. For 
example, in order to help her fourth grade students have a 
rich understanding of the term “evidence,” one of our 
fourth grade teachers used four slightly different varia- 
tions of the definition of evidence in the verbal talk that we 
observed. These variations included, “proof,” “prove it to 
me,” “is it true or not?,” and “ make certain the evidence is 
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valid.” To further characterize this type of verbal scaffold 
that complemented the written claim, evidence, and rea- 
soning scaffolds, we gathered information on the amount 
and kinds of different working definitions that different 
teachers generated as verbal scaffolds to complement the 
written scaffolds focused on explanation construction. 

Our data reveal that individual teachers, including both 
of our fourth grade teachers, presented several different 
variations of their own phrasings of definitions when they 
are first introducing the concept of evidence. Table 3 pre- 
sents a set of similar but not identical phrasings for the 
term evidence that our teachers generated to guide their 
students in constructing scientific explanations. We found 
it interesting that both Teacher A and Teacher B provided 
at least three different variations of evidence as a way to 
emphasize the importance of the term and to help establish 
a strong understanding of this term across their students 
and curricular activities. 

Verbal rephrasing of evidence also had some similarities 
across teachers. For example, while at times all four of the 
teachers discussed alternative definitions of evidence 
as a form of “proof,” teachers’ phrasings of proof were 
not identical with each other, nor did individual 
teachers always. use the same wording within their own 
classroom. 

Table 4 presents similar but not identical phrasings for 
the term reasoning. The curriculum defines reasoning as “a 
scientific concept or definition that you thought of to make 
your claim.” The alternative definitions used for reasoning 
were phrases such as “supports your claim,” or “what 
would make you say that?” Another teacher asked students 
to come up with a “general idea.” In this way, we believe 
she was trying to help her students to utilize the idea of a 
scientific concept as a tool for understanding reasoning. 
Our data also support the statement that our teachers pre- 
dictably utilized more redefinitions of difficult terms such 
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Table 4 \ 
Examples and Frequency of Alternative Definitions of Reasoning by Grade 
D (6th) E (6th) B (Sth) C (5th) A (4th) B (4th) 
Supports Tell me the reasons Reason that you Find out why, what was your 
your claim why you said that. said that. thinking on this. 
General idea Everybody’s doesn’t Why would you What reason? 
necessarily have to even say that? 
be the same. 
What made you the proof 
say that? - ‘ 
What’s your Why your evidence supports 
reason why? your claim. 
How we got there. 
Why did you choose this piece 
and not this piece? 
Why do you say this evidence 
proves your point? 
Where you got that reasoning 
from 
Total 4 i2 0 





as evidence and reasoning compared to terms more easily 
understood, such as claim. 


Discussion and Implications 

One of our early hypotheses was that even as we knew 
our elementary teachers were interested in tailoring verbal 
scaffolds to their target audience, we might observe teach- 
ers’ verbal scaffolds as looking like either variations of the 
written scaffolds or variations of secondary teachers’ 
verbal scaffolds observed by others. Interestingly, the 
verbal scaffolds we observed contained some similarities 
and some differences from the types of written and verbal 
scaffolds suggested by others’ research with secondary 
students. Our data revealed two verbal scaffold types that 
show similarities to the types of secondary student written 
scaffolds: Clarifying terms and directing to necessary 
content resemble content guidance and clarification 
observed by others. We suggest that our study confirmed 
the importance of these types of verbal scaffolds across 
age categories and context, as directing to necessary 
content was observed frequently by our teachers at all 
grade levels, and clarifying terms was prevalent or 
common across all three grades. 

Our work provides new insights about verbal scaffolds 
targeted for younger students through our observations of 
three of our scaffold types. The three types of verbal scaf- 
folds that were most common with younger students were 
not explicitly mentioned in others’ work with secondary 
students. These types: Writing format, formatting sentence 
content, and answer options represent higher levels of 
structure or scaffolding that might have been most benefi- 
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cial for younger students. For example, answer options 
involved the teacher specifically rephrasing an open-ended 
question into one or two responses so students did not have 
to generate these options themselves. The observation of 
these three new types of verbal scaffolds suggest that there 
may be a set of more specific kinds of guidance and scaf- 
folding that were observed as valuable for younger audi- 
ences just starting to learn about explanation building. 
Our results that illustrate the alternative working defi- 
nitions of key terms used by elementary teachers suggest 
important customization of verbal scaffolds to help 
younger students find meaning in the abstract, but impor- 
tant, component of the scientific explanation. For example, 
teachers might have used phrases like “proof” to help 
redefine evidence, as it is a common term that students 
might know from television police dramas or detective 
work. In this way, the teachers’ verbal scaffolds could help 
make a bridge or a connection to students’ everyday lan- 
guage with scientific practices. On balance, as the term 
“proof” has different meanings in the courtroom, field of 
mathematicians, and among scientists, we speculate that 
the same word could introduce confusion in higher grade 
levels when other uses of the word “proof” are introduced. 
Similarly, teachers may have introduced the verbal scaf- 
fold of “a general idea” for reasoning to guide students to 
move away from a specific instance or case and toward a 
“general idea.” We speculate that this teacher was hoping 
to guide students to move beyond a particular instance 
toward a guiding idea or definition that could back or 
support relevant evidence. These observations also suggest 
that the written scaffolds, while structured and permanent 
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on the page, may not be best used verbatim, where stu- 
dents are simply going through the motions but not grasp- 
ing a more substantive definition for each component. We 
believe our data suggest that the written scaffolds provide 
a general tool for students to become familiar with the 
parts of the explanation, while verbal scaffolds, discus- 
sion, and prompting by the teacher and others can help 
students make these unfamiliar ideas their own. This use of 
more general tools that are used explicitly but flexibly is 
also discussed in the work of Herrenkohl, Palinscar, 
DeWater, and Kawasaki (1999) with their intellectual 
tools, or thinking practices, that guide students as they 
construct explanations. 

Collectively, our results suggest that while written scaf- 
folds provide useful guidelines for teachers and students, 
verbal scaffolds can help bridge between the written scaf- 
folds and the abstract or unfamiliar scientific terms and 
their own lives and experiences. We see evidence in our 
work that the verbal scaffolds suggest a range of different 
kinds of supports or cognitive bridges between unfamiliar 
scientific terminology and ideas to more familiar ideas and 
practices. Our work also suggests some of the particular 
ways in which teachers customized their verbal scaffolds 
to complement written materials and tailor instruction for 
their younger target audience. 

Our findings suggest implications for both curriculum 
developers and professional development design. Our 
work suggests particular types of verbal supports that 
teachers could use to guide younger students in explana- 
tion building. Curriculum developers, including ourselves, 
might provide a set of productive working definitions of 
important, abstract terms such as evidence and reasoning 
or examples of better or weaker analogies to guide 
younger students and their teachers in more productive 
explanation construction. Examples of how particular 
verbal scaffolds may lead to particular productive or 
unproductive student response may help teachers plan how 
they will use written scaffolds. For example, we envision a 
resource that might accompany a curricular unit that 
would articulate the strengths and weaknesses of particu- 
lar working definitions, such as defining evidence as a 
“true fact” that may lead to students selecting true but 
incorrect evidence since not all true facts serve as appro- 
priate evidence to support a particular scientific question. 
Our findings also have implications for professional devel- 
opment. Written materials and student work might accom- 
pany video images of interactions between the teacher and 
students. Teachers could follow observations with student 
written answers toward discussion of the ways in which 
verbal scaffolds may have supported or confused students. 
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Our work extends a dialogue on the kinds of guidance 
and supports we need to guide students in explanation 
construction around focal science topics. If we are to expe- 
rience the kinds of success and understanding of science 
knowledge outlined in the Framework for K-12 Science 
Education (NRC, 2012) and the NGSS, we must extend 
both the dialogue and the empirical studies on explanation 
building toward a focus on elementary-age students or we 
risk falling short of the deeper conceptual learning we 
desire across grades, students, and contexts. 
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A variety of factors contributes to student achievement in mathematics, including but not limited to student behaviors 
and student, teacher, and school characteristics. The purpose of this study was to explore which of these factors have an 
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to student achievement. More research on the relationships between these factors shown to make statistically significant 


differences on mathematics achievement is needed to further explain several phenomena that this research reveals. 


A variety of factors contributes to student achievement 
in mathematics, including but not limited to student, 
teacher, and school characteristics. These factors have 
been widely researched (e.g., Bottoms & Carpenter, 
2003), but a primary concern for those in the education 
community remains the factors that educators and schools 
can control. Factors that lie beyond the control of educa- 
tors and schools may allow for systematic or instructional 
interventions that promote student persistence and success 
in mathematics. In this research, we were interested in 
both controllable and noncontrollable factors and their 
relationships with student achievement in mathematics. 


Predicting Mathematics Achievement 

Factors related to student achievement in mathematics 
can be categorized in multiple ways, including the ones 
that schools and teachers can control (e.g., faculty demo- 
graphics, instructional resources, and decisions) and the 
ones they cannot (e.g., student characteristics and out-of- 
school behaviors). We have chosen to partition these 
myriad factors into student factors and behaviors, teacher 
factors, and school factors, although some of the poten- 
tially influential factors could be considered in more than 
one category. We briefly discuss below each of these cat- 
egories and the related research literature. 
Student Characteristics 

Student characteristics that involve the environment 
outside the school, including influences of the student’s 
parents and home life, are beyond the control of the school 
and teachers. Student characteristics that have been shown 
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to be related to mathematics achievement include gender, 
ethnicity, home environment, and educational aspirations. 
We discuss each of these characteristics below. 

Gender. Although many would argue that gender dif- 
ferences with respect to mathematics achievement are 
dependent on instructional and classroom environmental 
factors rather than innate differences between boys and 
girls, researchers have shown that gender differences in 
favor of male students continue on standardized mathemat- 
ics tests and upper-level mathematics courses although 
there have been recent declines in this gap (Hyde, Lindberg, 
Linn, Ellis, & Williams, 2008). McGraw, Lubienski, and 
Strutchens (2006) found relatively small but consistent 
gender differences at the high end of the score distributions 
and primarily for high socioeconomic status (SES), White 
students, although there were also differences found for 
Latino students. This gender gap was also reflected signifi- 
cantly in advanced level mathematics courses (American 
Association of University Women, 1992). Although great 
strides have been taken to actively encourage and support 
girls in learning and achieving in mathematics (Koontz, 
1997), boys in all ethnic/racial groups are still outperform- 
ing girls on the SAT/Mathematics Test (Coley, 2001). 
Through the end of high school, however, girls have actu- 
ally been shown to earn better grades than their male 
counterparts (Kenney-Benson, Pomerantz, Ryan, & 
Patrick, 2006). One well-founded explanation for contin- 
ued differences in performance on mathematics assess- 
ments is stereotype threat (Blascovich, Spencer, Quinn, & 
Steele, 2001; Quinn & Spencer, 2001). According to Quinn 
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and Spencer (2001), “stereotype threat depresses females’ 
math performance through interfering with their ability to 
formulate problem-solving strategies” (p. 55). These 
researchers examined the performance level of men and 
women when completing word problems that include many 
steps that can cause doubt and frustration. Quinn and 
Spencer found that when doubt and frustration occurred, 
women responded by doubting their own skills. In instances 
when problems were converted into their numerical equiva- 
lents, women and men performed equally. 

Ethnicity. As is the case with gender, similarly argu- 
able are the effects of ethnicity on mathematics achieve- 
ment. Peng and Hall (1995) found race and ethnicity to be 
the largest contributing factors to mathematics achieve- 
ment. Studies conducted in various countries have shown 
that the majority group achieves better than the minority 
group (Bouchey & Harter, 2005; Rothman & McMillan, 
2004). Bottoms and Carpenter (2003) found that African 
American students did not achieve in mathematics at the 
same level as their White counterparts. This disparity in 
achievement was explained by a difference in levels of 
rigor, support, and expectations offered by teachers and 
schools. Similarly, Adelman (1999), using data generated 
by the High School and Beyond longitudinal study, found 
that the college-access gap between White and Black 
people, and Latinos remains 20% or higher with SES 
contributing modestly and race/ethnicity very little. 

Home environment. The home environment, much 
like ethnicity, has been shown to have a significant effect 
on student achievement in mathematics. There is abundant 
support for an association between student achievement in 
mathematics and home background factors (Beaton et al., 
1996; Cooper & Cohn, 1997; Marjoribanks, 2002; 
O’Connor, Miranda, & Beasley, 1999). Factors considered 
here include SES.and parent-education level. 

Student SES has been found to be closely related to 
student achievement in mathematics. The Coleman et al. 
(1966) report asserted that a student’s SES has a far greater 
association with achievement than any school characteris- 
tics. Several studies have since stated similar conclusions 
(e.g., Caldas & Bankston, 1997, 1999; Hanushek, 1986, 
1997; Steinberg, 1997). A National Council of Teachers of 
Mathematics (NCTM; 1998) task force found that students 
at the poverty level achieved at a significantly lower level 
than more affluent students. McCoy (2005) found that 
students who are non-White or poor or both were more 
likely to achieve a lower level in algebra. 

Parent education level has also been shown to have an 
impact on student achievement, although broad general- 
izations are difficult to make. According to Mullis, Martin, 
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Gonzalez, and Chrostowski (2004), a higher level of 
parental education was linked to a higher level of student 
achievement in mathematics. However, Leibowitz (1977) 
found that although the mother’s education level had a . 
significant effect on overall achievement, the father’s edu- 
cation level did not, suggesting that the mother’s educa- 
tion level was a proxy for the “quality and quantity of 
inputs to children’s development” (p. 247). Liebowitz con- 
cluded that reading with children had the biggest impact 
on children’s verbal development, whereas watching tele- 
vision was the most detrimental. 

Educational aspirations. There does exist a strong 
relationship between students’ educational aspirations and 
their academic achievement. Although a causal or direc- 
tional relationship is not supported by the literature, many 
researchers have found that high-achieving students have 
aspirations of matriculating to college and low achievers do 
not have such plans (U.S. Department of Education, 1997). 
Further, Fejgin (1995) found that of several family and 
student-related variables, students’ educational aspirations 
had the greatest impact on educational achievement. 

Overall, research on the relationships between several 
student characteristics and academic achievement has 
shown mixed results. Although some of the disparities in 
results may be attributed to differences in methodologies, 
others may be explained by using a wider lens to view the 
connection between students and achievement. Teachers 
and administrators have little to no control over their stu- 
dents’ characteristics or the behaviors in which students 
engage outside of school. Each of the student behaviors we 
investigated, however, is under the control of the parent 
and/or student. Through education of all stakeholders, the 
positive relationship between particular student behaviors 
and mathematics achievement can be exploited for the 
benefit of the student. Below, we examine the literature on 
those behaviors. 

Student Behaviors 

Student behaviors that have been shown to be related to 
mathematics achievement include use of calculators and 
computers, homework, and television viewing. We discuss 
each of these behaviors below. 

Use of calculator. There are mixed results in the litera- 
ture with respect to student uses of technology, including 
computers and calculators. Researchers have indicated, 
however, that the thoughtful use of calculators in the math- 
ematics class can have a positive effect on student achieve- 
ment. Ellington (2003) found that when calculators are 
used in both instruction and testing, positive effects on 
problem-solving skills occur, which result in positive 
effects on mathematics achievement. Hembree and 
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Dessart (1986), in their meta-analysis of 79 non-graphing 
calculator studies, concluded that the use of handheld cal- 
culators improved student learning. Similarly, Heller, 
Curtis, Jaffe, and Verboncoeur (2005) concluded that the 
more algebra topics taught using graphing calculators, the 
higher the students’ test scores. 

Use of computer. Jacobson (2006) found that com- 
puter usage made no difference in student achievement 
when compared with textbook exercises for low- 
performing students, although students reported being 
motivated to do homework when it was to be done on a 
computer. Wong (2001) found similar results for improved 
attitudes related to computer usage, but found improved 
performance for students who were given drill-and- 
practice homework on the computer. Mullis et al. (2004) 
found that, in much the same way having books in the 
home was associated with higher literacy levels, having a 
computer at home was substantially beneficial for math- 
ematics achievement. 

Homework. The relationship between homework and 
mathematics achievement at the secondary level is not one 
on which all researchers have agreed. Kohn (2006) 
reported that homework had little to no effect on student 
achievement and criticized research reporting otherwise, 
but Marzano and Pickering (2007) argued that Kohn 
misinterpreted much of his reviewed research. Cooper, 
Robinson, and Patall (2006) provided the most recent syn- 
thesis of the research on homework concluding that home- 
work, overall, has a positive effect on student achievement. 
Weems (1998) provided further evidence that homework 
has benefits for achievement in algebra. Weems also 
reported a significant difference between performance of 
students in college intermediate algebra courses who 
turned in homework and those that did not. 

Television viewing. Findings on the relationship 
between television viewing and achievement are both 
mixed and inconsistent. Studies using panel data from the 
High School and Beyond survey found that, after control- 
ling for SES factors, there was not a relationship between 
the number of hours of television watched and achieve- 
ment, as measured by test scores (Gaddy, 1986; 
Gortmaker, Salter, Walker, & Dietz, 1990; Hofferth & 
Sandberg, 2001). Aksoy and Link (2000), however, found 
a negative association between the amount of television 
watched and test scores based on panel data from the 
National Education Longitudinal Study. This finding 
supports previous similar results (e.g., Bowen & Bowen, 
1998; Hornik, 1981; Keith, Reimers, Fehrmann, 
Pottebaum, & Aubey, 1986). Zavodny (2006) recently 
critically questioned any causal inferences and found that, 
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for young adults, changes in television viewing time did 
not negatively affect achievement. In contrast, Paik (2000) 
found that there is a curvilinear relationship for high- 
ability students with positive correlation up to one hour of 
television viewing, whereas, for low-ability students, tele- 
vision viewing positively correlates with achievement. 
Rutkowski, Gonzalez, Joncas, and von Davier (2010) also 
refuted several previous studies by finding that the amount 
of television viewed was not significantly correlated with 
student achievement. 

Much like the research on student characteristics, results 
from the research on the relationships between several 
student behaviors and mathematics achievement is mixed. 
However, we can conclude, in general, for most learners, 
that the more related to cognitive development the behav- 
iors are, the stronger the positive relationship is with math- 
ematics achievement. Next, we examine the literature on 
teacher characteristics associated with student mathemati- 
cal achievement. 

Teacher Characteristics 

Teacher characteristics that have been shown to be 
related to mathematics achievement include teacher 
expectations, gender, ethnicity, and salary. We discuss 
each of these characteristics below. 

Expectations of students. Researchers have uncovered 
many factors that influence student achievement in math- 
ematics and, in particular, student achievement on state- 
mandated mathematics tests. Many of these factors are 
ones in which teachers or school administrators have some 
control. These factors, among others, include teacher 
expectations, teacher gender, teacher ethnicity, and teacher 
salary. The teacher plays a large role in student achieve- 
ment in mathematics and, by having high expectations for 
students, the teacher lets students know that learning is the 
goal and all students can learn (Bottoms & Carpenter, 
2003). High teacher expectations have been shown to 
affect reasoning skills more than verbal, memory, or 
motor intelligence quotient (Cundick, 1970; Fleming & 
Anttonen, 1971; Jussim & Harber, 2005; Rosenthal, 1994; 
Rosenthal & Rubin, 1978; Snyder, Shorey, & Rand, 2006). 
Middle grades and high school students who report that 
they experience moderate to high expectations in their 
classes have significantly higher mathematics achievement 
than students who do not report this (Bottoms & 
Carpenter, 2003). Bottoms and Carpenter did not report a 
significant difference between middle grades African- 
American and White students’ perceptions regarding the 
level of expectations they experience in their classes. 

Gender. Although there has been a significant amount 
of research on the relationship between mathematics 
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achievement and the gender of students, few researchers 
have attempted to determine whether teacher gender is 
related to student achievement (Li, 1999). Saha (1983) 
examined the effects of teacher gender on mathematics 
achievement in 21 less-developed countries using the 
International Association for the Evaluation of Educa- 
tional Achievement data and determined that students of 
male teachers in science and mathematics had higher 
achievement in science and mathematics than those stu- 
dents with female teachers. Furthermore, Warwich and 
Jatoi (1994) suggest that teacher gender has a larger influ- 
ence on a student’s mathematics achievement than does 
the gender of the student. 

Ethnicity. The relationship between teacher ethnicity 
and race and student achievement is not agreed upon by 
researchers. Dee (2004) found a strong effect on math- 
ematics scores for both White and Black students paired 
with teachers of their own race. This finding contradicts 
previous findings that there is little association between 
achievement and racial pairings of students and teachers 
(Ehrenberg & Brewer, 1995; Ehrenberg, Goldhaber, & 
Brewer, 1995). Ferguson (1998) asserted that race does 
appear to influence student achievement but that this rela- 
tionship is uncertain in magnitude and often based on less 
than conclusive evidence. More recently, researchers have 
suggested that having a same-race teacher may influence 
achievement through occurrences of role model effects, 
stereotype threat, and teacher biases (Dee, 2004, 2005; 
Hanushek, Kain, O’Brien, & Rivkin, 2005). 

Salary. In a study that examined teacher resources and 
their impact on student achievement, teacher salary 
was positively related to student achievement in math 
and reading in Texas (Jones & Palazzolo, 2009). Addi- 
tionally, Figlio and Kenny (2006) found correlation coef- 
ficients indicatedxa strong positive relationship (r = .87) 
between teacher incentives and student achievement on 
test scores. Moreover, test scores were higher in schools 
that offered individual financial incentives for good 
performance. 

In addition to these teacher characteristics closely asso- 
ciated with student achievement in mathematics, school 
characteristics have been shown to impact student achieve- 
ment. We briefly examine some of these characteristics 
below. 

School Characteristics 

School characteristics have been shown to be potentially 
important factors related to student achievement in math- 
ematics. School size and its relationship to achievement 
has been a focus of research interest over many years. 
Smaller schools have been strongly linked to increased 
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student achievement (Smith & Meier, 1995). Lee and 
Smith (1997) found that the ideal size for a high school 
was between 600 and 900 students. In schools larger than 
this range, student achievement was considerably lower. 
Raudenbush and Byrk (1986) found that school SES was a 
strong predictor of mathematics achievement with a stan- 
dardized regression coefficient of .82. Pong and Hao 
(2007), however, found that school SES is strongly asso- 
ciated with the grade point average of immigrants’ chil- 
dren but not of natives’ children. 
Need for Continuing Research 

Because there is little agreement in the literature on 
which factors, within and beyond the control of teachers 
and schools, are most closely related to student achieve- 
ment in mathematics, more research toward this determi- 
nation is needed so that teachers and schools can do 
everything possible to change policies or instructional 
design. These changes may take the shape of professional 
development for teachers, educational programs for 
parents, deliberate teaching assignments, or any number of 
other forms toward the goal of increasing achievement, but 
first, researchers, teacher educators, policy makers, school 
administrators, and teachers need to know which of these 
potential factors make the biggest difference related to 
achievement. This is where our research aims to make a 
difference. 


Method 

The purpose of this study was to explore student, 
teacher, and school factors that have an impact on student 
mathematics achievement. The research questions were: 

1. To what extent are student characteristics (i.e., gender, 
ethnicity, home environment, and educational aspirations) 
related to student achievement in mathematics? 

2. To what extent are student behaviors (i.e., use of 
calculator, use of computer, type and amount of homework, 
hours of television watched) related to student achievement 
in mathematics? 

3. To what extent are teacher characteristics (i.e., expec- 
tations for students, gender, ethnicity, and salary) related to 
student achievement in mathematics? 

4. To what extent are school characteristics (i.e., size and 
SES) related to student achievement in mathematics? 
Participants 

Data used in this study were from the North Carolina 
Department of Public Instruction (NCDPI). North Caro- 
lina students in Grades 9-12 who took Algebra II in 2006 
were the target population in this study. This population 
consisted of 64,980 students from 358 schools. Of this 
population, 63,522 (98%) were considered proficient in 
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English, 988 (2%) were considered limited proficient in 
English, and 470 (1%) did not report their English profi- 
ciency status. To remove the possible confounding variable 
of English proficiency status and for the sake of the con- 
venience of data analysis, students who were considered 
limited English proficient or who had missing data were 
removed. This resulted in a sample size of 57,897, which is 
89% of the target population. 

Among the participants of this study, 32,165 (55.56%) 
were female and 25,732 (44.44%) were male. As for eth- 
nicity, 37,504 (64.78%) were White, 15,397 (26.59%) 
were African American, 2,683 (4.63%) were Hispanic, 
1,249 (2.16%) were Asian, and 1,064 (1.84%) were mul- 
tiracial. The majority (n = 43,568, 75.25%) were not eli- 
gible for the free or reduced-price lunch programs. The 
rest of the participants were either eligible for the free 
lunch program (7 = 11,020, 19.03%) or reduced-price 
lunch program (nm = 3,309, 5.72%). Parent educational 
background was also available. The parents of 2,430 
(4.20%) students did not finish high school, 10,100 
(17.44%) graduated from high school, 21,299 (36.79%) 
attended community college, 16,522 (28.54%) attained a 
four-year college degree, and 7,546 (13.03%) attained a 
graduate degree. 

Operational Definitions of Variables 

Student demographic data were collected through 
regular registration process by each school and then 
reported to NCDPI. Student achievement in mathematics 
was measured by Algebra II End of Course (EOC) tests 
administered in North Carolina. EOC tests are designed to 
assess the competencies defined by the North Carolina 
Standard Course of Study for each of the following 
courses: Algebra I, Algebra II, English I, Biology, Chem- 
istry, Geometry, Physical Science, Physics, Civics and 
Economics, and U.S. History. Tests are taken during the 
last 10 days of school or the equivalent for alternative. 
Only EOC tests scaled scores for Algebra II were used in 
this study. The scaled scores ranged from 33 to 102. Stu- 
dents were considered to have insufficient mastery when 
their scores ranged from 33 to 45, inconsistent mastery 
when their scores ranged from 46 to 57, consistent mastery 
when their scores ranged from 58 to 68, and superior 
performance when their scores ranged from 69 to 102. 

Variables used to measure student characteristics 
include Gender (female and male), Ethnicity (White, 
African American, Hispanic, Asian, and Multiracial), 
Parent Education Level (did not finish high school, high 
school graduate, community college, four-year college, 
graduate degree), and Lunch Program Status (free, 
reduced price, full price). Variables used to measure 
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student behaviors include educational aspirations (seek 
employment after high school, enlist in military service, 
enroll in a business or trade school, enroll in a community, 
technical, or private junior college, and enroll in a four- 
year college), Use of calculator (student most often uses 
simple four-function calculator, fraction calculator, scien- 
tific calculator, and graphing calculator), Use of computer 
(student uses a computer at home for school work almost 
every day, twice a week, twice a month, hardly ever, and 
never), homework (no homework is ever assigned in this 
course, less than one hour each week for this course, 
between one and three hours, more than three but less than 
five hours, between five and 10 hours, and more than 10 
hours), Type of homework (student solves problems in 
textbook, works on worksheets, reads outside textbook, 
researches in the library or on the Internet, writes essay or 
lab reports), and hours of TV watched (student watches 
one hour or less each school day, two hours, three hours, 
four to five hours, six hours or more). Variables used to 
measure teacher characteristics include anticipated grade 
(teacher anticipated final grade for student in Algebra II to 
be A, B, C, D, or F), teacher gender (coded 1 for male and 
0 for female), teacher ethnicity (coded 1 for White and 0 
for all other ethnicities), and teacher salary (annual 
income in thousands of American dollars); variables used 
to measure school characteristics include school size (total 
number of students enrolled in 2006) and school SES 
(percentage of free/reduced-price lunch students in a 
school). 

Data Analysis 

Two analysis of variance (ANOVA) models were exam- 
ined for group differences in mathematics achievement: 
(a) a model with demographic information only and 
(b) a model with homework assignment and teacher expec- 
tation while controlling student demographic information. 
Post-hoc multiple comparisons were conducted with 
Bonferroni adjustment. Cohen’s (1988) threshold for the 
effect size (1°) of main effects was followed: small (.01), 
medium (.06), and large (.14). 

A three-level Hierarchical Linear Modeling (HLM) 
method was employed to examine individual predictors of 
student achievement in mathematics including homework 
assignment, teacher expectation, and student demographic 
information while taking into consideration the nested 
nature of the data (students were nested within teachers, 
while teachers were nested within schools). An uncondi- 
tional model was run first to check the amount of variance 
at each level and potential variables to include in the 
conditional model (Raudenbush & Bryk, 2002). The 
unconditional model is as follows: 
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Level 1: } 
Vie = Mo jx + Cijx 

where 
Yi is the EOC Algebra II scaled scores for student i taught 
by teacher j in school k; 
Tox 1s the estimated EOC Algebra II scaled scores for a 
student who has no educational aspirations, does not use 
calculator at all, has no homework assignment, does not 
watch TV at all, and the teacher’s anticipated grade for the 
student is F(fail); 

Level 2: 


To jx = Book + 1 jx 


where 
Box is the estimated average EOC Algebra II scaled score 
for all students taught by a non-White female teacher with 
an annual salary less than $1,000 in school k; 

Level 3: 


Boor = Yooo + Hoox 


where 

Yoo 1s the estimated EOC Algebra II scaled scores for a 
student in a school with no students eligible for free/ 
reduced-price lunch program; 

Simple conditional models were run for each variable, 
and only variables that were statistically significant were 
included in the complete conditional model to examine the 
effect of a single variable while controlling that of all other 
variables. The three-level complete conditional model is as 
follows: 

Level: 1: 


Yi = To jx + MH jx * (Aspiration); 

+ 1 x *(Calculator),,, + 73. * Homework); 
+ 14 ix *(TV) jx + Hs jx * (Anticipated Grade); 
a Cik 


where 

7x is the estimated change of EOC Algebra II scaled 
scores with a unit increase of student educational aspira- 
tions, while all other predictors stay at the same levels; 
Mx is the estimated change of EOC Algebra II scaled 
scores with a unit increase of student use of calculator, 
while all other predictors stay at the same levels; 

Tx 1S the estimated change of EOC Algebra II scaled 
scores with a unit increase of student homework assign- 
ment, while all other predictors stay at the same levels; 
Tj, 1s the estimated change of EOC Algebra II scaled 
scores with an increase of an hour that the student watches 
TV, while all other predictors stay at the same levels; 
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Tj. is the estimated change of EOC Algebra II scaled 
scores with a unit increase of the teacher’s anticipated 
grade of that student, while all other predictors stay at the 
same levels; 

Level 2: 


Too jt = Book + Borx * Ethnicity j, + Boor * Gender, 
+ Bos, * Salary jx +7 jx 


Th ik = Brox ; 
22 jk = Book 
13 jk = Box 
M4 jk = Baox 
Ns jk = Bsox 
where 


Bow is the estimated difference in EOC Algebra II scaled 
scores for a student taught by a White teacher versus a 
student taught by a non-White teacher if the teacher’s 
gender and salary are the same in school k; 

Book is the estimated difference in EOC Algebra II scaled 
scores for a student taught by a male teacher versus a 
student taught by a female teacher if the teacher’s ethnicity 
and salary are the same in school k; 

foxx is the estimated change of EOC Algebra II scaled 
scores with a unit increase of the student’s teacher’s salary 
in thousand dollars if the teacher’s ethnicity and gender 
are the same in school k; 

No predictors were added to Mijx, Mjx, Mix, Wajc, OL Msi 
because these were assumed to be the same regardless of 
teacher characteristics. These parameters were treated as 
fixed. 


Level 3: 
Boor = Yooo + Yoor *(SES), + Loox 
Brox = Yi00 
Boor = Y200 
Bsox = Y300 
Baox = Y 400 
Bsox = Ys00 
where 


%o1 18 the estimated change of EOC Algebra II scaled 
scores for a student with a unit increase of the percentage 
of students eligible for free/reduced-price lunch program 
in a school; 
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No predictors were added to Biors Books Bsoxs Bao.» or Bsox, 
because these were assumed to be the same regardless of 
school characteristics. These parameters were fixed. 


Results 

Descriptive statistics of students’ achievement in math- 
ematics classified by demographic information are pre- 
sented in Table 1. 

The unconditional HLM model suggested a statistically 
significantly large variance to be explained at the teacher 
level but not at the school level in general (Table 2). Spe- 
cifically, however, school-level SES was suggested as a 
statistically significant predictor at the school level. As a 
result, school SES was kept as the only predictor at level 3. 
The unconditional HLM model also suggested teacher- 
level variables, such as teacher gender, teacher ethnicity, 
and teacher salary as statistically significant level-2 pre- 
dictors. Therefore, these variables were included in the 
conditional model at level 2. Level 1 includes student-level 
variables such as educational aspiration, use of calculator, 
use of computer, homework, hours of TV watched, and 
anticipated grade. Parameter estimates of the uncondi- 
tional and conditional HLM models are presented in 
Tables 2 and 3, respectively. Because the HLM models had 
three levels, only students whose teacher and school infor- 
mation was available were included in the data set. This 
resulted in a sample of 28,258 students (48.81% of the 
sample used in the ANOVA), 530 teachers, and 108 
schools. Student demographic information variables and 


Table 1 i 
Descriptive Statistics of Students’ Mathematics Achievement by Demographic 
Information 


Demographic Information M SD 
Gender Female (n = 32,165) 66.38 10.06 
Male (n = 25,732) 66.97 10.74 
Ethnicity White (7 = 37,504) 68.93 10.01 
African-American (n = 15,397) 61.03 8.97 
Hispanic (n = 2,683) 64.73 9.50 
Asian (n = 1,249) 71.62 11.44 
Multiracial (n = 1,064) 66.19 10.18 
Parent Did not finish high school 62.71 9.24 
educational (n= 2,430) 
level High school graduate 63.62 9.44 
(n = 10,100) 
Community college (n = 21,299) 65.42 9.70 
Four-year college (n = 16,522) 68.25 10.50 
Graduate degree (n = 7,546) 71.86 10.84 
Lunch Free (n = 11,020) 62.35 9.24 
program Reduced-price ( = 3,309) 64.55 9.55 
status Full-price (n = 43,568) 67.89 10.39 
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type of homework were not included in the HLM model 
because these factors were examined in the ANOVA 
models. 

Magnitude of effect, or proportion of variance explained 
by the complete conditional model in comparison with the 
unconditional model, was calculated by one minus the 
ratio between the estimated variance of the complete con- 
ditional model and that of the unconditional model. The 
magnitude of effect turned out to be very satisfactory: 
41% at level 1 (the student characteristics included in the 
complete conditional model explained 41% of the vari- 
ance of student achievement in Algebra II); 42% at level 2 
(the teacher characteristics included in the complete con- 
ditional model explained 42% of the differences across 
classrooms in Algebra II); and 71% at level 3 (the school 
characteristics included in the complete conditional model 
explained 71% of the differences across schools in 
Algebra II). Detailed analyses addressing the research 
questions are in the sections that follow. 

Student Characteristics and Achievement 

The model with students’ demographic information 
only suggested a statically significant interaction effect 
between parent educational level and student lunch 
program status, F(8, 57,747) =5.91, p < .001, partial 72 = 
.001. Although the effect size (partial 72) is very small, 
the trend of this interaction is worthy of analysis (see 
Figure 1). 

The plot of this interaction effect suggested that the 
performance of students classified by school lunch 
program status varied by their parents’ educational level. 
For students whose parents had graduate degrees, those 
who were eligible for the free lunch program performed 
better than those who were eligible for the reduced-price 
lunch program. Both groups of students did not perform as 
well as students in the full-price lunch program. For stu- 
dents whose parents had four-year college degrees, those 
who were in the full-price lunch program performed better 
than those in the reduced-price lunch program, whereas 
students in the reduced-price lunch program performed 
better than those in the free lunch program. For students 
whose parents had community college degrees, or are 
trade or business school graduates, no significant differ- 
ences were noticed between performances of students in 
the full-price and reduced-price lunch program although 
both groups outperformed students in the free lunch 
program. For students whose parents had a high school 
diploma only, those in the reduced-price lunch program 
performed better than those in the full-price lunch 
program, and students in the full-price lunch program per- 
formed better than those in the free lunch program. For 
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Table 2 ’ 


Estimates for the Unconditional Model With Exploration of Possible Level II & III Predictors 





Estimated Parameters Fixed Effects 


Random Effects 
2 














Coefficient SE p Variance x P 
Intercept 66.69 34 196.03 <.001 
Teacher level — — — — 40.43 13,686.22 <.001 
School level — — -— — 2.46 131.22 .056 
Residual — — — — 66.63 = — 
Possible Level II & II Predictors i 

Coefficient SE 
Teacher gender —2.47 93 —4.11 
Teacher ethnicity 93 Al 4.4] 
Teacher salary 001 .000 1.83 
School SES =0853 0157 —5.44 
School size .0018 .0022 83 


SE = standard error. 


Table 3 
Estimates for the Conditional Model 


Estimated Parameters 


Coefficient SE t 

Student level 

Educational aspirations 1.24 07 16.87 
Use of calculator 2.28 14 15.80 
Use of computer 105 .04 Ai) 
Homework 1.45 07 21.96 
Hours of TV watched TOS .05 —7.74 
Anticipated grade 4.77 .06 79.76 
Teacher level 

Ethnicity 91 ali, SP 
Gender Ease .60 —3.86 
Sala 78 36 2.18 
School level 

School SES =) 93 —.98 


Simple Conditional Model 


Complete Conditional Model 


Pp Coefficient SE t Pp 
<.01 38 .05 7.67 <.001 
<.01 1.06 11 O53 <.001 

a3 — — — = 
<.01 58 .0S 179 <.001 
<.01 “pili .03 =3.39 .001 
<.01 4.55 .06 81.54 <.001 
<.01 .67 .16 4.22 <.001 
<.01 =2:16 oe —3.80 <.001 

.03 oe eB 2.24 .03 

33 — _ — — 





SE = standard error; SES = socioeconomic status. 
We, 


those whose parents did not finish high school, no signifi- 
cant differences were noticed between performances of 
students in the full-price lunch and free lunch program, but 
students in the reduced-price lunch program outperformed 
the other two groups (see Figure 1). Gender difference, 
however, was not found to be statistically significant, F(1, 
57,747) = .16, p = .69, partial n2 < .001. This means that 
male and female students performed equally well on math- 
ematics tests if the student ethnicity, parent educational 
level, and school lunch program status were controlled. 
The main effect of ethnicity was noticed, F(4, 55,747) = 
229.45, p < .001, partial 72 = .02. Each group was statis- 
tically significantly different from each other. The order of 
student ethnic groups placed by the performance in 


w 
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Algebra II was: Asian, White, Multiracial, Hispanic, and 
African American. The main effect of lunch status was 
also noticed, F(2, 55,747) = 19.82, p < .001, partial n2 = 
001. Statistically significant differences were noted 
among students with full-price lunch, reduced-price lunch, 
and free lunch status. 

The main effect of parent educational level was also 
noticed, F(4, 55,747) = 17.47, p < .001, partial 72 = .001. 
Students whose parents had graduate degrees had statisti- 
cally higher scores in mathematics than students of any 
other groups classified by parent educational level. The 
trend was that the higher the level of education their 
parents had attained, the higher scores the respective stu- 
dents achieved in mathematics (see Table 1). 
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Figure 1. Interaction effect of student lunch program status and parent edu- 
cational level on estimated marginal means of Algebra II scale scores. The 
horizontal axis represents parent educational level where 1 = did not finish 
high school; 2 = high school graduate; 3 = community college, some educa- 
tion, trade and business; 4 = four-year college; and 5 = graduate degree. 


Educational aspirations were found to have a signifi- 
cantly positive impact on student academic achievement in 
mathematics, #(28,246) = 7.67, p <.001 (see Table 3). The 
higher order students have on their educational aspira- 
tions, the higher the score they received on the Algebra II 
EOC test. 

Student Behaviors and Achievement 

In the investigation of the effect of calculator use on 
mathematics achievement, it was determined that the more 
complex the calculator students use, the higher the score 
they received on the Algebra II EOC test, #(28,246) = 9.53, 
p<.001. Student use of computers did not have a significant 
impact on their performance in mathematics, (28,246) = 
—1.21, p = .23. The hours of television watched; however, 
had a significantly negative impact on student achievement 
in mathematics, t(28,246) = —3.39, p = .001. The more 
students watch television, the lower the score they received 
on the Algebra II EOC test. 

It was ascertained that the amount of homework assign- 
ment (the more homework students do) positively 
impacted the student’s score on the Algebra II EOC test 
and teacher’s anticipated grade (the higher the grade 
teachers expect students to earn, the higher the score they 
received on the Algebra II EOC test). 

The second ANOVA model with homework assignment 
and teacher expectations suggested a statistically signifi- 
cant main effect of homework assignment, F(5, 57,777) = 
50.23, p < .001, partial n2 = .004 and a statistically sig- 
nificant main effect of teacher expectations, F(4, 57,777) = 
860.55, p < .001, partial 2 = .06. Significant differences 
were noted between students who were assigned home- 
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work but do not do it (= 64.02, SD = 10.72) and students 
whose teachers never assigned homework (M = 58.66, SD 
= 9.79). The more homework teachers assigned to stu- 
dents, the better these students performed on the math- 
ematics test. Student mathematics achievement was also 
found to be significantly different based upon the type of 
homework assignment received from the teacher, F(5, 
57,777) = 196.26, p < .001, partial 72 = .02. Students who 
were assigned homework to read outside the textbook (V/ 
= 64.85, SD = 10.16) had significantly lower mathematics 
achievement than any other groups. Students who were 
assigned to work on worksheets (M = 67.23, SD = 10.36) 
performed better than students who had homework assign- 
ment to solve problems in textbook (MV = 66.87, SD = 
10.12) but worse than students who were assigned to 
research in the library or on the Internet (M = 67.73, SD = 
10.85) or to write essay or lab reports (M = 67.79, SD = 
10.85). No statistically significant differences were 
noticed between students who were to do research in the 
library or Internet and students who were to write essay or 
lab reports as homework assignments. 
Teacher Characteristics and Achievement 

All teacher-level variables we investigated were found to 
be statistically significant, impacting student achievement 
in mathematics. Students who were anticipated to get an A 
for the mathematics course (M = 76.55, SD = 9.04) per- 
formed better than those who were anticipated to get a B 
(M = 69.43, SD = 8.66). Similar patterns were found for 
students who were anticipated to get a C (M= 64.18, SD= 
8.43), D (M = 60.52, SD = 7.94), or F (M = 56.55, SD = 
7.76) for their mathematics course. Students taught by 
male teachers performed significantly worse than students 
taught be female teachers, (526) = —3.80, p < .001. On 
average, students taught by male teachers scored 2.16 
lower than students taught by female teachers. Students 
taught by White teachers performed better than those 
taught by teachers of other ethnic groups, t(526) = 4.22, p 
< .001. On average, students taught by White teachers 
scored .67 higher than students taught by teachers of other 
ethnic groups. Teacher salary had a significantly positive 
impact on student achievement in mathematics, 1(526) = 
2.24, p = .03. A unit increase of teacher salary ($1,000/ 
year) was estimated to increase the students’ Algebra II 
scores by .72. 
School Characteristics and Achievement 

School size was not included in the conditional model 
because the unconditional model did not suggest that 
adding this variable could significantly make a difference, 
(106) = .83, p = .41. Additionally, school level SES, mea- 
sured by the percentage of students eligible for free/ 
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reduced-price lunch program, did not have a signifi¢ant 
impact on student achievement in mathematics, (106) = 
=98, = 333; 


Discussion 

The findings of this research study both support and add 
to the current literature on the relationships between 
student, teacher, and school effects and mathematics 
achievement. Our findings indicated that the higher order 
of student educational aspirations, the higher the student 
achievement in mathematics. This is consistent with 
Fejgin’s (1995) findings. 

An additional determination of this study was that the 
hours of television watched negatively impacted student 
achievement in mathematics. This is consistent with find- 
ings of Aksoy and Link (2002); however, Hofferth and 
Sandberg (2001) found that there was not a relationship 
between the number of hours of television watched and 
achievement, as measured by test scores. It is evident that 
the findings regarding television and student achievement 
varies among research studies. Aligning with the findings 
of Bottoms and Carpenter (2003), this study also con- 
curred that high expectations from teachers of students 
positively affects student achievement. In this study, it was 
determined that the higher the teachers’ anticipated course 
grade for the student, the higher the student score on the 
EOC examination. Thus, teachers’ expectations of the stu- 
dents’ overall course score was a good indication of 
student success on the state mandated test. It was also 
determined in this study that the more homework that 
students do, the higher their mathematics achievement. 
Teachers must assign homework first before students com- 
plete it. This can also be considered an expectation of the 
teacher. 

Finally, teacher salary had a significantly positive 
impact on student achievement in mathematics. In this 
particular state, a teacher’s salary increases as the number 
of years of experience increases or as the teacher gains a 
higher degree or certification. The reason for this positive 
impact on student achievement may not be related as much 
to salary as it is to having a more experienced mathematics 
teacher. 

Contrary to many research studies (e.g., Coley, 2001; 
Quinn & Spencer, 2001), this study found that gender 
difference in mathematics achievement on standardized 
tests is not statistically significant. Boys did not outper- 
form girls on the Algebra II EOC examination. Awareness 
of potential gender differences and addressing those dif- 
ferences in the classroom could be the reasons for the 
decrease in the mathematics achievement gap between 
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boys and girls. Teachers are more commonly differentiat- 
ing instruction, which can address gender differences 
rather than teaching all students in the same ways. The 
reason for this may be that teachers are being prepared - 
more effectively to respond to potential gender issues in 
their classrooms. 

There were several findings from this research study that 
were interesting and added a different perspective to the 
current literature. The research uncovered both expected 
and unexpected findings related to homework. It is not 
surprising that the more time a student spends on home- 
work, the greater the level of achievement. It is interesting 
to note, as found in this study, that the type of homework 
that students complete does make a difference. This 
research determined that students who were assigned 
homework to read outside the textbook had significantly 
lower mathematics achievement than any other groups. 
Also, somewhat surprising was that students who were 
assigned to work on worksheets performed better than 
students who had homework assignments to solve prob- 
lems in the textbook. These same students, however, per- 
formed worse than students who were assigned to research 
in the library or on the Internet or to write essay or lab 
reports. No significant statistical differences were noticed 
between students who were doing research in the library or 
on the Internet and students who were writing essays or lab 
reports as homework assignments. These findings regard- 
ing homework indicate that teachers need to be thoughtful 
when assigning homework and take into consideration that 
the type of homework assigned does have an association 
with mathematics achievement. It should also be noted 
that these findings are consistent with research on assess- 
ment that asserts students should be assessed in similar 
ways as they were taught. In this case, more successful 
students were taught in ways similar to the way they were 
to be assessed (Volante, 2004). 

With respect to the use of calculators, this study 
revealed that the more complex the calculator used, the 
higher the score on the Algebra II EOC test. This finding is 
consistent with that of Heller et al. (2005) and is a step 
further than the findings of Ellington (2003) and Hembree 
and Dessart (1986) because neither Ellington (2003) nor 
Hembree and Dessart (1986) looked into specific types of 
calculators. 

The findings of this study are fairly consistent with 
findings of other research. Student behaviors, student char- 
acteristics, and teacher characteristics can have an influ- 
ence on student achievement in the area of mathematics. 
Teachers need to consider these findings when planning 
mathematics instruction. 
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Implications 

Because we do not believe that female teachers are, in 
general, more effective than male teachers or that White 
teachers are, in general, more effective than those of all 
other ethnic groups, it is important to know why these 
teachers were more effective than their counterparts for 
these students in these schools. Further research is needed 
to explain these phenomena and help avoid perpetuating a 
stereotype that White female teachers are the most effec- 
tive group of teachers. Previous research (Campbell, 
Hombo, & Mazzeo, 2000) showed a positive relationship 
between parent educational degree and student achieve- 
ment. Our study confirmed this relationship but took it a 
step further by examining the minute differences at each 
levels of parent educational degree versus SES status. It 
was interesting to note that while it is not surprising that 
students whose parents have a four-year college degree 
and have paid lunch outperform those students with free or 
reduced lunch. What is interesting to note is that students 
whose parents did not complete high school and received 
reduced lunch scored higher than students whose parents 
did not complete high school and paid for their lunch. This 
phenomenon warrants further investigation. 

Since little research exists regarding the impact that the 
type of homework assigned has on mathematics achieve- 
ment, it would be beneficial for more research to be con- 
ducted. This study indicated that worksheets are most 
effective in increasing achievement in mathematics. More 
research needs to further investigate whether this finding is 
true for other populations of students. Teachers as well as 
students would benefit from this knowledge. 
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This study uses a large nationally representative data set (ECLS-K) of 5,181 students to examine the extent to which 
exposure to content and instructional practice contributes to mathematics achievement in fifth grade. Using hierarchical 
linear modeling, results suggest that more exposure to content beyond numbers and operations (i.e., geometry, algebra, 
measurement, and data analysis) contribute to student mathematics achievement, but there is no main effect for 
increased exposure on developing numbers and operations. Two significant interactions between exposure to specific 
content and racial composition of the classroom emerge. Specifically, as exposure to more diverse content increases, the 
classroom mathematics achievement gap among students in predominately Caucasian classrooms and those composed 
predominately of students of color appears to narrow. Findings are discussed with regard to promoting increased 


opportunities to learn mathematics for students in racially diverse classrooms. 


With national emphasis on accountability and stan- 
dardized testing, schools and teachers face pressure to 
improve achievement for all students. Despite recent 
gains in mathematics achievement scores nationally, 
American students are still struggling and lagging behind 
other nations. Results from the 2011 National Assess- 
ment on Education Progress (NAEP) find that only 40% 
of American fourth-grade students perform at or above 
proficiency in mathematics (National Center for 
Education Statistics [NCES], 2011). In addition, achieve- 
ment gaps continue to exist between African American, 
Hispanic, and Caucasian students (Lubienski, 2002, 
2006; NCES, 2011). These gaps appear to be partially 
due to inequities and disparities in exposure to specific 
areas of mathematics content and instructional practices 
(Bodovski & Farkas, 2007; US Department of Education, 
2008). Examining the amount of exposure to mathemat- 
ics content and instruction students receive can provide 
insights into two factors that lead to differences in 
mathematics performance. In the present study, we use 
data from the Early Childhood Longitudinal Study— 
Kindergarten Cohort (ECLS-K) to examine relations 
between exposure to mathematics content, instructional 
practices, and mathematics achievement, and look at dif- 
ferences in these relations by racial composition of the 
class. 
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Opportunity to Learn: Exposure to Mathematics 
Content and Instruction 

Teachers play a primary role in providing students with 
opportunities to learn mathematics (Pianta, Belsky, Houts, 
& Morrison, 2007); thus, the types of mathematical 
content and opportunities they provide may make a strong 
contribution on students’ mathematics learning in elemen- 
tary school. Opportunity-to-Learn (OTL) is a construct 
that researchers have used to operationalize the contribu- 
tion of instruction on students’ achievement (Carroll, 
1963). Tate (2005) characterized OTL as consisting of 
three variables: (a) content exposure and coverage, (b) 
content emphasis, and (c) quality of instructional delivery. 
The content exposure and coverage variables focus on the 
amount of time spent on mathematics topics and the depth 
of coverage of mathematics. The content emphasis vari- 
able focuses on the selection of topics within the math- 
ematics curriculum and the selection of students for basic 
skills instruction or for higher order skills instruction, 
while the quality of instructional delivery variables focus 
on pedagogical strategies. 

Research suggests that greater opportunities to learn are 
related to gains in achievement (Nye, Konstantopoulos, & 
Hedges, 2004). In other words, students who receive more 
exposure to a topic and are given ample time to practice 
are more likely to process the material than students who 
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receive less exposure (Hiebert & Grouws, 2007; Porter, 
2002; Rotherham & Willingham, 2009). Allocating more 
time to mathematics content and practice provides stu- 
dents with more opportunities to learn, and as a result, 
may contribute to students’ mathematics achievement 
(Wenglinsky, 2004). However, few large-scale research 
efforts have focused on capturing these opportunity vari- 
ables in combination to examine predictors of student 
learning. This study uses Tate’s (2005) three components 
of OTL to understand the relations between exposure to 
mathematical content and instructional strategies. This 
work is different from previous work because it defines 
instructional content and strategies based on the National 
Council of Teachers of Mathematics (NCTM) Content and 
Process Standards and the Common Core Standards, as 
explained below. 


Looking Into the Mathematics Curriculum: What 
Should Be Taught? 

Understanding classroom characteristics that predict 
mathematics achievement is an area of strong interest 
for researchers, policy makers, and practitioners (NMP, 
2008). Following the push for reform in mathematics, 
policies have focused on ways to improve the quality of 
instruction that is provided to all students. NCTM’s Prin- 
ciples and Standards for School Mathematics (2000) 
suggest that high quality mathematics instruction should 
focus on five process standards (problem solving, reason- 
ing and proof, communication, connections, representa- 
tion) that are interlinked with the five mathematics 
content standards (number and operations, algebra, 
geometry, measurement, and data analysis and probabil- 
ity). Building on these standards, the Common Core State 
Standards Initiatsye (2010) emphasize mathematical pro- 
cesses and proficiencies and provides specific critical 
content areas at individual grade levels. Therefore, math- 
ematics instruction should not only emphasize how teach- 
ers teach (instructional practices), but what teachers teach 
(mathematical content). 

The elementary mathematics curriculum and content is 
spiraled throughout the grades with increasing complexity, 
to establish progressively deeper understanding of content 
over time. For example, the NCTM and Common Core 
Standards for Mathematics (CCSM) K-5_ standards 
provide students with a solid foundation in whole 
numbers. By the fifth grade, students should receive 
broader exposure of content such as geometry and propor- 
tionality, which set the foundation for algebra and higher 
level mathematics. 
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Many states have developed content-specific curriculum 
guides and standards for teachers, reflecting what is often 
measured on standardized achievement tests; however, it 
appears that there may be a lack of alignment between the 
intended curriculum set by state and national standards 
and the enacted mathematics content and practices being 
taught by teachers. Research indicates large variation in 
the United States with respect to the exposure to math- 
ematics content that teachers provide to students (Swanson 
& Steverison, 2002). Nationally, teachers report teaching 
only 80% of the content in their mathematics curriculum 
(Adodini et al., 2009). If teachers consistently omit 20% of 
the mathematics content each year, it is important to 
understand what concepts teachers overemphasize and 
which concepts are usually neglected. Therefore, examin- 
ing what topics are covered in mathematics classrooms, 
nationally, could help explain why some students are not 
excelling in certain aspects of mathematics. Unfortunately, 
there is little research exploring how often teachers are 
teaching specific content strands or how emphasizing 
certain content standards of mathematics over others can 
impact student achievement. 


Developing Numbers and Operations 

Developing an understanding of numbers and opera- 
tions has been identified as “the heart of mathematics” 
(Kilpatrick, Swafford, & Findell, 2001, p. 2). According to 
the NCTM standards, instruction in numbers and opera- 
tions should enable students to “understand numbers, their 
relationships, and number systems; understand meanings 
of operations and how they relate to one another; and 
compute fluently and make reasonable estimates” 
(National Council of Teachers of Mathematics (NCTM), 
2000, p. 78). A large percentage of teachers in schools 
report over emphasizing numbers, algorithms, and basic 
facts, and teach mathematics in a basic way (Wenglinsky, 
2004). In addition, many teachers use textbooks and pre- 
pared procedural problems as the focus of the lesson for 
teaching numbers and operations, as opposed to providing 
challenging tasks for students. 

Despite this overemphasis on building number sense, 
research findings are mixed with regard to the contribu- 
tion of exposure to numbers and operations on student 
achievement. Guarino, Hamilton, Lockwood, and Rathbun 
(2006) and Bodovski and Farkas (2007) both found that 
teacher reported frequency of emphasizing numbers and 
Operations in kindergarten was positively related to 
achievement gains. However, other studies have found 
increased exposure to instruction emphasizing numbers 
and operations to be detrimental to student achievement 
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(Wenglinsky, 2004). For example, D’Agostino (2000) 
found that by the fourth grade, overemphasizing numbers 
and operations was no longer related to improved student 
achievement. However, incorporating balanced instruction 
in all content areas was the most effective method for 
improving student mathematics achievement. This sug- 
gests that while developing skills related to numbers in the 
early grades is important, older students may need expo- 
sure to broader and deeper levels of content to improve 
mathematics achievement and understanding. This leads 
us to investigate the contribution of teaching content other 
than numbers and operations. 


Teaching Content Beyond Numbers and Operations 

In addition to numbers and operations, the NCTM and 
CCSM standards outline four other content strands 
(algebra, geometry, measurement, and data analysis and 
probability). In the upper elementary grades, approxi- 
mately equal curriculum emphasis should be given to all 
five content areas, only one of which is number (NCTM, 
2000, p. 30). However, NAEP data reveal that very little 
time is dedicated to teaching these other content areas 
(Wenglinsky, 2004). As a result, many students are denied 
adequate amounts of instruction on content beyond 
numbers and operations, especially related to functions 
and geometry. Consequently, U.S. students tend to perform 
poorly on items related to these additional content stan- 
dards: American students’ performance on measurement, 
geometry, and proportion was well below the international 
average, but on par with numbers and operations 
(Kilpatrick et al., 2001). One potential way to improve 
mathematics achievement may be to increase the amount 
of exposure to this additional content, and in fact, research 
has found positive benefits to exposing students to more 
instruction in the content areas beyond numbers and 
operations (specifically geometry) (Lubienski, 2006; 
Wenglinsky, 2004). 


Instructional Practices for Teaching Mathematics 

In addition to content, the use of instructional practices 
in mathematics may contribute to student achievement. 
Instructional practices do not focus on the content of 
mathematics but rather on the processes and methods 
toward providing instruction (NCTM, 2000). Mathematics 
instruction should provide students with opportunities to 
build mathematical knowledge through problem solving in 
context, applying and adapting a variety of problem 
solving strategies, and reflecting on the problem-solving 
process (NCTM, 2000). While learning mathematics, chil- 
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dren should think critically about formulating, represent- 
ing, and solving mathematical problems. Further, students 
should think logically about problems, with the goal of 
being able to explain, justify, reflect on their work with 
others, and discuss their reasoning and thinking. Writing 
in mathematics also helps students reflect on their thinking 
and consolidate their ideas about the topic. Visual repre- 
sentations should be used as a tool to solve problems but 
also to communicate students’ thought processes and 
mathematical ideas to others. Connections and applica- 
tions of the mathematics to other concepts and the real 
world should be made to produce deeper understanding. 
Consequently, exposure to instructional practices provide 
opportunities for students to interconnect mathematical 
ideas within mathematics and outside of mathematics, 
and help develop improved mathematical understanding 
(NCTM, 2000). 


Inequalities in Mathematics Instruction: Racial 
Composition of Classrooms 

Ideally, high-quality mathematics should be equitable 
and available to all students, regardless of background 
(Schoenfeld, 2002). However, gaps in achievement persist 
between African American, Hispanic, and Caucasian 
students (Lubienski, 2002; NCES, 2011). Further, these 
gaps appear to widen as students progress through school. 
One potential contributor to the gap in achievement 
could be inequities in the opportunities to learn math- 
ematics. Research suggests that the amount of exposure 
to content and instruction that teachers provide to their 
students vary depending on whom they are teaching 
(Wang, 2010). 

Serious disparities appear to exist between achievement 
and exposure to specific areas of content and instructional 
practice related to race and income (NMP, 2008; Wang, 
2010). Several studies have suggested that African Ameri- 
can and Hispanic students’ content exposure and emphasis 
in elementary and middle grades have primarily consisted 
of numbers and operations, and computational skills, with 
little depth in other mathematics strands (Bodovski & 
Farkas, 2007; Lubienski, 2002). In contrast, teachers in 
schools and classrooms with fewer students of color and 
students not on free or reduced lunch often spend less time 
focusing on basic skills, procedural instruction, and 
numbers and operations, leaving more time to focus on 
additional content and conceptual development and higher 
order thinking skills. 

Two studies used fourth-grade NAEP data to examine 
which content and instructional practices were most effec- 
tive for improving student mathematics achievement and 
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reducing the racial achievement gap. Wenglinsky (2004) 
examined the relations of 23 content and instructional 
variables to student achievement in three different 
samples: all students (irrespective of race); African Ameri- 
cans; and Latinos. Overall, he found that practicing basic 
skills and routine exercises was related to improved 
achievement, while frequent testing and emphasizing facts 
was related to reduced scores on the NAEP. Different 
patterns of results for African Americans and Hispanic 
students were found: exposing African American students 
to content such as geometry, measurement, and estimation 
were particularly beneficial, while focusing on data analy- 
sis proved to be most helpful for Hispanic students. 
Wenglinsky concluded that the variance for the within- 
school racial gap could be fully explained by these content 
and instructional variables. Lubienski (2006) used factor 
scores (calculator use, facts and skills, collaborative 
problem solving, non-number curricular emphasis, writing 
about mathematics, and manipulative use) to predict 
decreases in the achievement gap. Although she reported 
similar patterns of exposure to instruction, weak findings 
suggested that Wenglinsky’s conclusion was unwarranted. 
Due to the cross-sectional nature of this NAEP data, these 
studies have several limitations, including not controlling 
for previous student achievement and focusing solely on 
school differences in the achievement gap at the child 
level, as opposed to the differences in content, instruc- 
tional practice, and achievement in relation to the racial 
composition the classroom. 

Past studies have suggested that classroom composition 
may be an important variable in determining which 
instructional and content variables are most related to 
achievement. Bodovski and Farkas (2007) investigated 
differences in content and instructional practices between 
race and social class in kindergarten, and found that teach- 
ers with higher percentages of African American students 
emphasized more procedural skills rather than conceptual 
understanding. However, when more exposure to math- 
ematics instruction was provided for students in class- 
rooms with higher percentages of African American 
students, their mathematics achievement increased. It has 
also been reported that teachers whose classroom compo- 
sition is more white and affluent may be more likely to use 
higher level instructional practices better aligned with the 
NCTM standards, while teachers whose classroom com- 
position is more diverse and less affluent may be more 
likely to provide more basic instruction on number and use 
less effective practices (Schoen, Cebulla, Finn, & Fi, 
2003). In addition, schools with higher instructional 
spending were able to support higher quality instruction; 
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while schools with low instructional spending have teach- 
ers who were less knowledgeable about the standards and 
focused on basic skills and operations (Swanson & 
Stevenson, 2002). It is plausible that exposure to a narrow 
band of mathematics content and practices denies many 
students of color the opportunity-to-learn mathematics, 
and by extension, these students may demonstrate lower 
performance. 

Taken together, little is known about how exposing stu- 
dents to different content strands or instructional practices 
contributes to fifth-grade student mathematics achieve- 
ment. In addition, results appear to be small and inconclu- 
sive regarding whether exposure to particular content 
contributes to increased mathematics achievement. While 
past work has investigated similar relationships (Bodovski 
& Farkas, 2007; Lubienski, 2006; Wenglinsky, 2004), 
there have been no studies conducted in upper elementary 
school grades that account for important student and class- 
room level variables (such as opportunities to learn, racial 
composition, and previous achievement). A closer look at 
the contributions of exposing students to varying content 
in the fifth grade could be helpful in understanding low 
mathematics achievement, especially among classrooms 
with large numbers of students of color. 


The Present Study 

The present study exploits data from ECLS-K, a large 
nationally representative data set, to examine the extent to 
which exposure to specific mathematical content and 
instructional practice contributes to mathematics achieve- 
ment scores in fifth grade. Three research questions were 
posed. First, does exposure to developing numbers and 
operations, beyond numbers and operations content, 
and/or instructional practices predict students’ fifth-grade 
mathematics achievement? We hypothesize that exposure 
to both numbers and operations and content beyond 
numbers and operations (i.e., geometry, algebra, measure- 
ment, and data analysis) would independently contribute 
to higher mathematics achievement, even after controlling 
for student demographic variables. Second, does exposure 
to numbers and operations interact with the classroom 
racial composition to predict students’ fifth-grade math- 
ematics achievement? Third, does exposure to beyond 
numbers and operations interact with the racial composi- 
tion of the classroom to predict mathematics achievement? 
We hypothesize that exposing students in classrooms with 
higher percentages of students of color to more instruction 
focusing on diverse content (i.e., geometry, algebra, mea- 
surement, and data analysis) will positively contribute to 
their achievement. 
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Methods 
Participants 

Participants in this study were selected from the 
publically available ECLS-K data set, a longitudinal 
sample that was sponsored by the U.S. Department of 
Education and NCES. A sample of 22,782 kindergarteners 
from both public and private schools was selected in the 
fall of 1998 to participate in the study. A total of 17,487 of 
these kindergarten students (from 3,500 classrooms in 
1,280 schools) were followed through the end of the high 
school. (For more information about the ECLS-K study 
and data collection process, refer to http://nces.ed.gov/ 
ecls/kindergarten.asp.) 

The current study uses a subset of the fifth-grade par- 
ticipants from the sixth wave of the ECLS-K original 
longitudinal sample. This sample was reduced as a result 
of excluding certain groups of children from data collec- 
tion (children who moved from the country, children 
whose parents refused to cooperate, children who were 
excluded in earlier waves, and children who did not have 
first or third-grade data). In addition to the lower rate of 
subsampling in the fifth grade (Tourangeau, Nord, Le, 
Pollack, & Atkins-Burnett, 2006), we only included par- 
ticipants whose teachers had completed the mathematics 
questionnaire in the fifth grade. Participants in the present 
study were 5,181 fifth graders (2,583 male, 2,598 female) 
from 1,523 schools. Of the students, 3,034 (58.6%) were 
Caucasian, 574 (11.1%) were Black, 922 (17.8%) were 
Hispanic, 344 (6.6%) were Asian, and 301 (5.8%) were of 
another ethnic minority. The levels of mother education 
varied with 1,869 (33%) holding a high school diploma or 
below, 1,411(27.2%) attending some college, 891 (17.2%) 
with a bachelor’s degree, and 519 (10%) holding a gradu- 
ate degree. Children were taught by 2,838 teachers who 
were identified as the study child’s primary fifth-grade 
mathematics teacher. With regard to preparation, 1,583 
(56.7%) had a bachelor’s degree, 996 (35.7%) had a 
masters degree, and 211 (7.6%) had completed more than 
a masters degree. Teaching experience averaged 13.95 
years, with a range of | to 35 years (SD = 10.06). Of the 
1,523 schools, 1,281 (84.1%) were public and 996 (67%) 
were located in a suburban region. 

Procedures 

Data were collected from three sources: parents, teach- 
ers, and trained research assistants during the 2003-2004 
school year. The participants’ parents completed a demo- 
graphic questionnaire providing information about their 
income to needs ratio, maternal education, their child’s 
ethnicity, and gender. Classroom teachers completed ques- 
tionnaires about themselves, providing demographic infor- 
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mation and details about their education, instructional 
practices, and teaching experience. Students completed a 
mathematics achievement test, administered by trained 
research assistants. 
Measures 

Child and family demographic measures. Student 
demographic information (gender, race) was taken from 
information confirmed in the parent interview. Socioeco- 
nomic status (SES) was computed at the household level 
using data from the parent interview in the spring of 2004 
(fifth grade). Households whose income was below the 
threshold as determined by the 2003 census data were 
considered to be living in poverty. The components used to 
create the SES variable included: father/mother/guardian’s 
education, occupation, and household income. The SES 
variable was then divided into five quintiles, with SES 5 
representing the participants with the highest SES. 

Fifth-grade mathematics achievement. The fifth- 
grade mathematics assessment was directly administered 
to children in the spring of 2004 using workbooks with 
open-ended questions. The assessments addressed all five 
content strands (numbers and operations and operations; 
measurement; geometry; data analysis, statistics, and 
probability; and patterns, algebra, and functions), with 
numbers and operations being the largest strand tested. 
Conceptual, procedural, and problem solving were 
assessed in each of the content strands. In addition, some 
of the items required students to apply knowledge from 
more than one strand. Mathematics achievement was mea- 
sured through Item Response Theory (IRT) scale scores in 
order to facilitate comparisons over time. 

. Teacher-reported fifth grade mathematics question- 
naire. Teachers of sampled children were asked to 
respond to 24 instructional practice and content items 
taken from the revised child-level fifth-grade mathematics 
teacher questionnaire (U.S. Department of Education, 
2004). Teachers were asked to report answers for each of 
the 24 items relating specifically to students’ exposure to 
content, skills, and instructional practices, using the fol- 
lowing question stem “how often does the child identified 
on the cover of this questionnaire engage in 
as part of mathematics instruction?” (Refer to the Appen- 
dix for a list of items.) Teachers were asked to circle their 
response for each of the 24 items on the following scale (1 
- Almost every day; 2 - Once or twice a week; 3 - Once or 
twice a month; and 4 - never or hardly ever). These items 
were then reverse scored so higher exposure to each item 
was represented by higher numbers. 

According to a study of the measure’s structural validity 
using a promax rotated exploratory factor analysis (EFA) 
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(Ottmar, Konold, Berry, Grissmer, & Cameron, 2012),.19 
items from this measure (NV = 5,181) revealed a clean 
pattern of simple structure across three factors. Five items 
were dropped from the original questionnaire due to low 
loadings and item reliability. The remaining 19 items were 
found to load onto three distinct factors and have relatively 
good internal consistency reliability (developing numbers 
and operations (DNO), a = .75; beyond numbers and 
operations (BNO), o = .78; and instructional practices for 
teaching mathematics (IPTM), @ = .80). A composite 
score for each factor was created by summing the corre- 
sponding items and taking the mean. This approach was 
used to allow for interpretation of the composites, use the 
original metric, and compare differences between expo- 
sures to different factors. 

Approach to Analysis 

Our first aim was to determine how exposure to different 
mathematics content and instructional practice contributes 
to students’ achievement. We used an approach similar to 
Bodovski and Farkas (2007) and Lubienski (2006); con- 
ducting factor analyses to identify factors of content, skills, 
and practices that load together (Ottmar et al., 2013), and 
then conducting regression analyses to determine achieve- 
ment effects as a result of these instructional practices. We 
used hierarchical linear modeling (HLM) (Raudenbush & 
Bryk, 2002) to estimate the relations between student math- 
ematics achievement in the fifth grade and the three con- 
structs of interest (DNO, BNO, and IPTM). Because some 
children were nested within classrooms, a two-level HLM 
model was employed. Intra-class correlations (ICCs) (1.e., 
no predictors at the child or classroom level) were calcu- 
lated and evaluated to determine the proportion of variance 
attributed to child and classroom level variables. The ICC 
of .46 suggests that 46% of the variance was at the class- 
room level, further supporting our decision to use a two- 
level model. 

Five two-level HLM models were conducted to test our 
three research questions. Model 1 tested the unconditional 
model predicting achievement, with no predictors added. 
Next, child level predictors were added to model 2. At the 
child level, variables included third-grade mathematics 
achievement, gender, race, maternal education, and socio- 
economic status. In model 3, both child and classroom 
level predictors were added to the model to estimate the 
main effects of exposure to developing numbers and 
operations, beyond numbers and operations content, and 
instructional practices for teaching mathematics. Predic- 
tors at the classroom level included years of teaching expe- 
rience, teacher education (bachelor’s or master’s degree), 
percentage of Caucasian, African American, and Hispanic 
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students in the classroom, as well as the location of the 
school (urban, rural, or suburban), and whether the school 
was public or private. 

Our second question was concerned with the interaction 
between classroom racial composition and exposure to 
developing numbers and operations content. The current 
study takes a more unique look at examining differences in 
mathematics instruction based on the racial composition 
of the classroom by employing the percentage of Cauca- 
sian students in the classtoom, rather than the individual 
race of the student. Thus, model 4 tested the interaction of 
percentage of Caucasian students in the classroom and 
exposure to DNO at the classroom level (level 2). This 
particular interaction was of interest, given the past 
research that suggests students in classrooms with high 
proportions of students of color are disproportionately 
overexposed to basic numbers and operations and opera- 
tions, and rarely exposed to other content areas. Our third 
research question was concerned with the interactions 
between classroom racial composition and exposure to 
beyond numbers and operations content. Model 5 tested 
the interaction of percentage of Caucasian students in the 
classroom and exposure to BNO at the classroom level 
(level 2). 


Results 
Research Question 1: Exposure to Content 
and Instructional Practices Predicting 
Mathematics Achievement 

We first examined the nature and variation of content 
factors to mathematics achievement in the fifth grade. 
Descriptive statistics for all variables are reported in 
Table 1. HLM estimates of the five two-level HLM models 
predicting fifth-grade achievement and the interactions of 
interest, ICC’s, and variance components for the five 
models are reported in Table 2. On average, teachers report 
teaching DNO content between three to five days a week 
(M = 3.57, SD = .42), but exposed students to BNO 
content (geometry, algebra, measurement, data analysis) 
less often, approximately one or two times every few 
weeks (M = 2.85, SD = .56). 

After including child level predictors (model 2), the 
child and classroom level variance was reduced by 67 and 
93%, respectively. Model 3 tested the main effects of 
exposure to DNO, BNO, and IPTM on achievement. 
Eighty-five percent of the variance in student achievement 
was at the child level, while the additional 15% was at the 
classroom level. Including classroom predictors accounted 
for an additional 1% reduction in variance at the child 
level, and a 6% reduction at the classroom level. Results 
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Table 1 
Means and Standard Deviations for Predictor and Outcome Variables 
M SD 
Level 1 (child-s ecific) 
Third-grade math achievement 92 592162 


Gender (0 = male, | = female) 50 50 


Race (1 = Caucasian, 2 = African American, 2.18 1.79 
3 = Hispanic) 

Mother education (1 = Lower thanHS-5= 4.63 1.82 
grad school) 

SES (quintiles 1-5) 3.21. 1.40 

Level 2 (teacher/classroom) 

Teacher education (0 = M.A, 1 = B.A) 56 50 

Number of years teaching experience 14.10 10.13 

Location (urban) 61 49 

Location (rural) 34 AT 

School type (0 = private, 1 = public) 89 OF 

“% students Caucasian in math class 53 oi 

% students African American in math class .16 ca 

% students Hispanic in math class 13 Ly, 

Instructional practices for teaching 2:93 oT 
mathematics 

Developing numbers and operations 3.61 40 

Beyond numbers and operations 2.87 56 


indicate that greater exposure to content beyond numbers 
and operations (i.e., geometry, algebra, measurement, data 
analysis) contributed to higher achievement, p < .01. 
However, more exposure to numbers and operations or 
instructional practices did not significantly contribute to 
achievement growth, all p’s > .05. 
Research Question 2: Interaction of Classroom 
Racial Composition and Exposure to Developing 
Numbers and Operations 

As displayed in model 4, a significant interaction was 
present for exposure to. DNO and the percentage of Cau- 
casian students in the classroom, p < .01 (Figure 1). Stu- 
dents in classrooms with higher percentages of students 
of color had the highest achievement when they were 
exposed to more instruction related to numbers and opera- 
tions content. In contrast, the achievement for students 
who were in classrooms with higher percentages of Cau- 
casian students was similar, regardless of the amount of 
emphasis teachers placed on these number concepts. 
Research Question 3: Interaction of Classroom 
Racial Composition and Exposure to Beyond 
Numbers and Operations 

Results from model 5 demonstrate that greater exposure 
to content beyond numbers and operations (i.e., geometry, 
algebra, measurement, data analysis) predicted higher 
mathematics achievement scores, p < .01. The interaction 
of BNO and percentage of Caucasian students in the class- 
room was significant in predicting mathematics achieve- 
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ment, p < .01 (Figure 2). More specifically, students in 
classrooms with higher percentages students of color had 
the highest achievement when they were exposed to more 
BNO instruction. However, when these students were 
exposed to very little instruction on these concepts, their 
achievement scores were lowest. In addition, the achieve- 
ment of students (regardless classroom racial composi- 
tion) appeared to improve with more exposure to these 
concepts. 


Discussion 

Several main findings emerged from this study of fifth- 
grade students. First, teachers generally report exposing 
their students to far more numbers and operations con- 
cepts than other content areas. Second, a main effect indi- 
cated that increased exposure to content beyond numbers 
and operations (i.e., geometry, algebra, measurement, data 
analysis) is related to improved achievement. However, no 
significant effects were found for exposure to numbers and 
operations or instructional practices. Next, there was a 
significant interaction between classroom composition and 
exposure to numbers and operations, where students in 
classrooms with higher percentages of Caucasian students 
did not show the same benefits as their peers in classrooms 
with high percentages of students of color from increased 
exposure to basic skills. Finally, there was a significant 
interaction between classroom composition and exposure 
to geometry, algebra, measurement, and data analysis, 
where students in classrooms with higher percentages of 
students of color demonstrated improved achievement 
when spending more time in these under-taught content 
areas. 

Although national curriculum standards recommend 
that teachers provide a balance of mathematics concepts, 
this does not appear to be occurring in the classroom. This 
study supports past findings that American teachers gen- 
erally overemphasize instruction related to number and 
operations, far more than other content areas (Kilpatrick, 
Swafford, & Findel, 2001). When teachers overemphasize 
numbers and operations, they potentially sacrifice time 
that they could be spending on other concepts, such as 
geometry, algebra, measurement, data analysis, and statis- 
tics. Although overemphasizing number concepts is the 
national norm, students need an understanding of all 
content areas to develop mathematics proficiency. Our 
findings suggest that exposing students to more broad 
instruction developing both concepts of numbers and 
operations and concepts related to geometry, algebra, mea- 
surement, and statistics contribute to mathematics 
achievement. However, the main effect for BNO (and lack 
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Table 2 \ 
Hierarchical Linear Model Estimates for Models Predicting Mathematics Achievement 
Parameter Model 1 Model 2 Model 3 Model 4 Model 5 
Intercept 111.58 (.40) 109.59(.57) 111.64(.84)  111.70(.84) —-:111.54 (.84) 
Level 1 (child specific) 
Third-grade math achievement Oote GOI 4.82 C0) .81** (.01) £81 F#1C.01) 
Gender =,52.(.30), ( 364=(30) 664'G30) 1657280) 
Race 1S" (OO) as. Un) 237 (10) .23* (.10) 
Mother education .10 (.13) ASCP) 12 (12) AZZ) 
SES LAS PEGAS) © 29222 (.19) 91F* (19) 92 EGS) 
Level 2 (teacher/classroom) : 
Teacher education =22 (31) 2837) = 264.37) 
Number of years teaching experience —.39 (.68) —.7 (.68) —.43 (.68) 
Location (urban) —.06 (.39) =108'(°39) = 0239) 
Location (rural) 2.118" 639) ol * 39 ieee) 
School type (public) —1.23** (.47) ..=1.20** (47) 22216" 4) 
% students Caucasian in math class 2.26*'(.96)'5 13.868" (G:97Z)") Fa89* 2.62) 
% students African American in math class =2.01 (229) "= 2H ST 29) =a) 
% students Hispanic in math class 2.66 (1.77) 2:35( 1.76) 2.40(1.76) 
Instructional practices for teaching mathematics =.03 \(.36) —.03 (.36) —.60 (.36) 
Developing numbers and operations 89 (48): 2, 58* her 7 .75 (.48) 
Beyond numbers and operations L.97** (38) “9 S"*' C38) see 1) 
“% Caucasian X developing numbers and operations =3127 **\(P.06) 
% Caucasian X beyond numbers and operations 1,992 * C87) 
Variance components 
Child level 261,59 85.104 84.07 84.04 84.45 
Teacher level 224.63 15.59 14.61 14.44 14.45 
Level 54 85 85 85 85 
ICC Level 2 46 olliS 15 ks “lS 
% explained at child level .67 .O1 .00 .00 
“% explained at teacher level 93 .06 O01 O01 
ey O05; - pp < OUT: 
123.5 124 
eer More than 
1s ge ages 70% of the 
2 122.5 ; : = 7 class are 
Dies? an eee Ce Me deer cee ha eyo A 5 students of 
3 121.5 os ee = CO cg at 50% of the 
B14 De | NR 50% of the a class are 
3 - class are 3 120 students of 
& 1205 eS = students of 5 color 
Ss 120 Under 30%0f  ‘@ 119 a ’ ae Under 30% 
119.5 the class are 2 of class are 
students of 118 students of 
119 / an - : color Celor 
Low exposure— Medium High exposure— 117 ts 


once or twice a exposure—once almost everyday 
month or twice a week 


Figure 1. Interaction of classroom racial composition by exposure to devel- 
oping numbers and operations content. 


of findings for DNO) suggests that increasing exposure to 
these other concepts may be more promising for improv- 
ing children’s mathematics achievement. 

Findings suggest that content areas focusing on number 
appear to be emphasized more often in American class- 
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Low exposure— 
once or twice a 
month 


Medium High exposure— 
exposure—once almost everyday 
or twice a week 


Figure 2. Interaction of classroom racial composition by exposure to beyond 
numbers and operations content. 


rooms, while the other content areas receive less emphasis 
and exposure in the classrooms. However, these areas that 
tend to be underemphasized (i.e., geometry, algebra, prob- 
ability and statistics, and measurement) were found to 
positively contribute to student mathematics achievement. 
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These exposure findings could help explain why American 
performance on numbers and operations was equal to that 
of other industrialized nations, but well below the interna- 
tional average for measurement, geometry, and propor- 
tion (Kilpatrick et al., 2001; Provasnik, Kastberg, Ferraro, 
Lemanski, Roey, & Jenkins, 2012). Perhaps the low per- 
formance of some students is a result of never being pre- 
sented with adequate opportunities to learn these tested 
concepts. Consequently, students do not learn mathemat- 
ics content unless they are exposed to it and have sufficient 
opportunities to learn and practice it (Rotherham & 
Willingham, 2009). 

There is much variability in how much teachers report 
exposing students to varying content domains, suggesting 
that some students may be shortchanged in mathematics. 
The inequities in mathematics education with regards to 
exposure may limit some student’s opportunity to learn. 
Our study suggests that, at the classroom level, as expo- 
sure to more diverse content increases, the gap among 
students in predominately Caucasian classrooms versus 
classrooms composed predominately of students of color 
may narrow. In addition, mathematics achievement for all 
students appears to increase. More exposure to geometry, 
algebra, measurement, and data analysis for students in 
more diverse classrooms is associated with higher gains in 
children’s mathematics achievement. Our results provide 
promising evidence that increased exposure to concepts 
related to all mathematics content could potentially 
decrease gaps in achievement scores, especially for class- 
rooms composed of predominately students of color. 
However, the patterns of increased exposure to numbers 
and operations in this sample looked very different 
depending on the racial composition of the classroom. 
While increased exposure to DNO concepts was related to 
improved achievement for students in predominately 
classrooms of color, additional exposure for students in 
less diverse classrooms (mostly Caucasian students) did 
not improve achievement. Therefore, teachers may need to 
tailor what concepts they emphasize in their instruction 
according to their prior opportunities to learn mathematics 
and the population of students specific to their classroom. 

It has been reported that students in classrooms with 
high numbers of students of color are more likely to 
receive more instruction on basic number and operations, 
and rarely taught concepts related to other content strands 
(Lubienski, 2006). The significant interaction for DNO 
and classroom composition suggests that exposing stu- 
dents in more diverse classrooms to numbers and opera- 
tions content may be partially beneficial to moving 
students further along in mathematics. However, this does 
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not imply that these students should only receive increased 
instruction on basic numbers and operations. These stu- 
dents must also receive increased exposure to instruction 
on other more complex mathematical concepts. By 
denying minority students opportunities to learn diverse 
mathematics concepts, teachers may be contributing to a 
gap in mathematics content exposure; thus contributing to 
the larger gap in achievement. Due to the patterns of 
de-emphasizing these content areas year after year, stu- 
dents may never be provided the opportunities to develop 
and practice these skills, further leading to inequities. 
Perhaps by increasing these students’ opportunities to 
learn diverse mathematical concepts during elementary 
school, their mathematics achievement and trajectories 
could improve, potentially serving as one factor in narrow- 
ing the gap in achievement in mathematics. 

These findings have implications for research, policy, 
and practice in the mathematics classroom. For research- 
ers, this study provides evidence that including opportu- 
nity to learn variables is valuable to understanding student 
mathematics achievement. This study provides further 
support for Wenglinsky (2004) and Lubienski’s (2006) 
recommendation to administrators and policy makers, that 
teachers should place a greater emphasis on teaching non- 
numbers and operations content. This study can also help 
inform policy decisions about the challenges of imple- 
menting curriculum standards. For teachers and adminis- 
trators, this study could be used to inform decisions related 
to professional development through modifications of 
pacing guides to ensure that students are receiving a 
balance of instruction in all content areas of mathematics. 
Future Research and Limitations 

There are several limitations of this study that are com- 
monly reported in secondary large-scale data analysis 
studies. First, relying solely on teacher self-reports can 
cause many problems when reliably measuring content 
and instructional practices in the classroom, including 
fidelity, validity, and reliability (D’Agostino, 2000). For 
example, teachers may interpret the meaning of items dif- 
ferently than intended (1.e., not having a common defini- 
tion for the different strands of content). Due to the 
complex intertwined nature of mathematical content and 
practice (NRC, 2000), it can be difficult for teachers to 
tease out individual strands when thinking about their 
practice. Despite these challenges, studies have found that 
teachers are fairly accurate in reporting survey data over 
time and can distinguish the relative amount of time spent 
using different classroom instructional practices (Mayer, 
1999). Another limitation is that due to the limited four- 
category response, the exact amount of time teachers 
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spend on activities cannot accurately be determined; there- 
fore, there is no way to determine how much time teachers 
are actually teaching certain content. While teacher report 
data are not ideal, their use is balanced by the many chal- 
lenges of collecting national representative observational 
data sets from thousands of participants over time. 

Although beyond the current scope of this article, we 
acknowledge that there are teacher and classroom vari- 
ables, in addition to content emphasis and exposure to 
instruction, which may contribute to improved mathemat- 
ics achievement (e.g., mathematics instructional quality 
teacher beliefs and efficacy, and pedagogical content 
knowledge). While the inclusion of these variables would 
further tease out the contribution of exposure to content and 
instructional practice on achievement, this study is limited 
by the variables that were collected in the ECLS-K study. 
For example, the teacher questionnaire only includes the 
frequency of content and practices, leaving out any data 
about the quality of instruction or fidelity of their instruc- 
tional practices. When looking at teacher instructional 
practices and content individually, teachers can claim they 
are using certain instructional practices and teaching con- 
cepts, but may, in fact, be implementing very different 
approaches. Thus, our focus on the quantity (amount of 
exposure) to content and instructional practices should not 
be construed to reflect variations in quality. Future research 
could address these limitations by conducting large-scale 
observational studies that aim to collect data and examine 
how exposure, as well as these other variables mentioned 
above, could individually and collectively contribute to 
students’ mathematics achievement. 
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Appendix 
Mathematics Teacher Questionnaire Items and Factors 
“How often does the child identified on the cover of this questionnaire engage in as part of mathematics 
instruction?” (1- Never or hardly ever; 2- Once or twice a month; 3- Once or twice a week; and 4- Almost every day). 
Factor 1: Instructional Practices for Factor 2: Developing Numbers Factor 3: Beyond Numbers 
Teaching Mathematics and Operations and Operations 
Discuss solutions to math problems with — Learning skills and procedures needed to Geometry 
other children solve routine problems 
Work and discuss math problems that Learning math and concepts Measurement 
reflect real life 
Solve math problems in small groups or Understanding place value with whole Data analysis, statistics, and probability 
with a partner numbers 
Write a few sentences about how to Numbers and Operations Work with measuring instruments (e.g., 
solve a math problem rulers) 
Use visual representations (e.g., Making reasonable estimates of Algebra and functions 
diagrams, tables, models) quantities 
Learning how to communicate ideas in Developing reasoning and analytical Performing operations with fractions 
mathematics effectively ability to solve problems 
Work with manipulatives (e.g., geometric 
shapes) 
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Taking an Item-Level Approach to Measuring Change With the 
Force and Motion Conceptual Evaluation: An Application of Item 
Response Theory 


Robert M. Talbot III 


University of Colorado Denver 


In order to evaluate the effectiveness of curricular or instructional innovations, researchers often attempt to measure 
change in students’ conceptual understanding of the target subject matter. The meastrrement of change is therefore a 
critical endeavor. Often, this is accomplished through pre-post testing using an assessment such as a concept inventory, 
and aggregate test scores are compared from pre to post-test in order to characterize gains. These comparisons of raw 
or normalized scores are most often made under the assumptions of Classical Test Theory (CTT). This study argues that 
measuring change at the item level (rather than the person level) on the Force and Motion Conceptual Evaluation 
(FMCE) can provide a more detailed insight into the observed change in students’ Newtonian thinking. Further, such 
an approach is more warranted under the assumptions of Item Response Theory (IRT). In comparing item-level 
measures of change under CTT and IRT measurement models, it was found that the inferences drawn from each analysis 
are similar, but those derived from IRT modeling stand on a stronger foundation statistically. Second, the IRT approach 
leads to analyzing common item groupings which provide further information about change at the item and topic level. 


The measurement of change is necessary for evaluating 
the effectiveness of instructional innovations in educa- 
tional contexts. Without measures of change in students’ 
conceptual understanding, we lack strong foundations for 
making inferences about existing and reform-oriented 
instructional strategies and curricula. Although measures 
alone are not sufficient for making these inferences, they 
are a necessary part of such research. However, measure- 
ment itself is difficult. It is part art and part science. There 
is no single test score that can unequivocally tell us every- 
thing we need to know about students’ conceptual under- 
standing. Moreover, attempting to measure change in 
conceptual understanding presents its own problems. Edu- 
cational researchers have been tackling these problems for 
years (cf. Cronbach & Furby, 1970; Willett, 1988-89), and 
lively discussions are still taking place regarding the mea- 
surement of change. Challenging as it is, measurement of 
change must be undertaken at all levels of instruction and 
across all subjects. In our current educational culture of 
accountability it is important for us to attempt to measure 
change in students’ conceptual understanding. 

In science instruction, concept inventories are often 
administered to students pre- and post-instruction in order 
to characterize change in conceptual understanding. The 
Force and Motion Conceptual Evaluation (FMCE; 
Thornton & Sokoloff, 1998) is one such concept inven- 
tory. It is often used in introductory physics courses to 
evaluate students’ ability to think in a Newtonian fashion. 
The pre/post administration of this and other concept tests, 
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such as the Force Concept Inventory (FCI) (Hestenes, 
Wells, & Swackhammer, 1992), is used in physics courses 
in order to provide evidence for making inferences about 
changes in students’ ability to think in Newtonian terms. 
These measured changes are then taken to be indicators of 
course efficacy. The physics education research (PER) 
community has been using these types of assessments for 
a number of years (e.g., Bonham, Deardorff, & Beichner, 
2003; Finkelstein & Pollock, 2005; Meltzer, 2002; 
Pollock, 2004; Smith & Wittmann, 2007; Van Domelen & 
Van Heuvelen, 2002). Assessment work by PER research- 
ers has provided us with a wealth of data to analyze (e.g., 
Hake, 1998) and has contributed much to the literature on 
learning in introductory physics courses (e.g., Hake, 
2002), specifically with regards to comparing “traditional” 
approaches to teaching to more interactive or innovative 
approaches. For example, Bonham et al. (2003) used the 
FMCE (in addition to other measures) to compare student 
learning between two groups: one which engaged in paper- 
based homework assignments and another which engaged 
with web-based homework. 

Discussions about the use of concept inventories in PER 
take place frequently in various communities and on 
listservs such as the Physics Learning Research List 
(PhysLrnR'). Though concept inventory use is widespread, 
it is not without some debate. For example, on many 
concept inventories, students “hit the ceiling” on the post- 
test (i.e., obtain a perfect score). This becomes a potential 
issue when calculating a gain score. Another issue that has 
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been discussed recently relates to the context in which 
students’ understanding is measured. Are measures of 
Newtonian Thinking that derive from scores on a concept 
inventory different from those that might derive from a 
performance based setting (e.g., lab)? In other words, 
researchers are thinking critically about the strengths and 
limitations of these concept inventories. Despite these 
potential issues, the use of concept inventories provides a 
sort of common ground upon which we can communicate 
and compare our findings, and as such their use remains 
popular in discipline-based education research, especially 
in PER. This article reports on research which has impli- 
cations for the “ceiling” issue while accepting the contex- 
tual limitations of this and other concept inventories. 

Measures of change in PER are most commonly based 
on a comparison of raw scores from pre—post testing. A 
student’s composite score on an exam serves as a proxy 
for their ability with respect to the construct of interest, in 
this case Newtonian thinking. Within that framework, dif- 
ferences in post- and pre-test raw scores are often normal- 
ized and used as indicators of the amount of change in 
student conceptual understanding that has occurred during 
instruction (e.g., Hake, 1998). Although these normalized 
raw score difference measures are quite useful as indica- 
tors of change in understanding, they have some problems. 
The measurement models applied to these types of analy- 
ses are a part of Classical Test Theory (CTT), which is 
based on observed raw scores and considers those scores 
to be composed of true score and error score components. 
The most notable issues with these CTT measures of 
change include the raw score bias (and resulting problems 
in scale), potential low reliability of change scores if cor- 
relation between pre- and post-test scores is high, and 
spurious relationships between gain scores and _ initial 
scores due to measurement error (Bereiter, 1963). 

This particular study addresses the following research 
questions: At the item-level, how does an Item Response 
Theory (IRT) approach to the measurement of change on 
the FMCE compare to CTT measures of change? Further, 
do these different approaches to measuring change on the 
FMCE lead us to make different inferences about student 
learning of Newtonian Physics? 

Item Response Theory is a theoretical approach to 
designing, analyzing, and scoring tests. It is often associ- 
ated with current research and work in construct-based 
measurement (Wilson, 2005) which pays particular atten- 
tion to the construct as the theoretical object of interest. 
IRT statistical models are probabilistic models and as such 
generate estimates of a respondent’s ability or the diffi- 
culty of items on a test. Because of the rigorous develop- 
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ment process and strong statistical foundations, these 
methods are often used in the development and analysis of 
high-stakes tests such as the Graduate Record Exam 
(GRE). However, also because of the characteristics, IRT- 
based development and analyses are intensive, difficult to 
learn and carry out, and results are not always straightfor- 
ward to interpret. 

The next section of this article will provide some back- 
ground on the FMCE. Both CTT and IRT measures of 
change on the FMCE will be presented, and the assump- 
tions underlying each of these approaches will be dis- 
cussed and contrasted. The methods section then describes 
the sample used in this study and the CTT and IRT analy- 
sis results (linked to specific items on the FMCE). It will 
be shown that an examination of how item difficulty 
changes from pre- to post-test can provide researchers 
with more detailed information about changes in student 
understanding as compared to aggregate test scores. By 
considering the items as an indicator of change, one can 
isolate and examine specific aspects of Newtonian think- 
ing. Further, there exists a stronger statistical basis for 
making these item comparisons under the assumptions of 
IRT (as opposed to CTT), which will be discussed in the 
methods section. Finally, the discussion section will syn- 
thesize the results of the study and suggest future direc- 
tions for research. 


The Force and Motion Conceptual Evaluation 

The FMCE was designed to characterize students’ con- 
ceptual understanding of Newtonian mechanics (Thornton 
& Sokoloff, 1998). More specifically, it is intended to 
measure student understanding of kinematics and New- 
ton’s laws in one dimension which are generally covered 
in introductory physics courses. The original purpose of 
the FMCE was one of formative assessment, as it was 
intended to be useful as a guide to instruction by indicating 
in which areas of mechanics student views differed from 
those of a physicist (Thornton & Sokoloff, 1998). 
However, many current uses of the FMCE are for charac- 
terizing change in students’ views, which are more of a 
summative or evaluative form of assessment rather than a 
formative one. Thornton, Kuhl, Cummings, & Marx 
(2009, p. 2) state that “the FMCE was not originally 
designed to have results analyzed with a single-number 
score, but to begin our comparison, we felt it necessary to 
create such a score for the exam.” In this way, a students’ 
(and a class’) FMCE change scores are used to character- 
ize changing views about physics understanding. These 
measures of change are then compared across courses in 
order to make comparisons of instructional efficacy (e.g., 
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comparing Interactive Engagement [IE] courses with tra- 
ditional courses (e.g., Hake, 1998) ). Indeed, the FMCE 
authors themselves have used the instrument for making 
these cross-course comparisons. The most common use of 
this instrument is therefore quite an extension of the origi- 
nal design intention. 

The FMCE is used in both algebra and calculus-based 
general physics courses. These courses usually have large 
enrollments (between 60 and 500 students per semester). 
A recent meta-analysis (Ruiz-Primo, Briggs, Iverson, 
Talbot, & Shepard, 2011) identified 148 comparative 
studies from 74 papers in physics education, six of which 
used the FMCE and normalized gain scores to compare an 
innovative instructional approach to a more traditional 
one. For example, Cummings, Marx, Thornton, and Kuhl 
(1999) used the FMCE to evaluate the effectiveness of 
Interactive Lecture Demonstrations, Cooperative Group 
Problem Solving, and a standard Studio Physics course. 
The goal of this work was to characterize the effect of 
incorporating research-based activities into the Studio 
Physics course. In another set of studies, Smith and 
Wittmann (2007) used the FMCE to compare the effect of 
different tutorials on students’ understanding of Newton’s 
Third Law. Overall pre-test FMCE score was used to 
establish group equivalence. A subset of FMCE items was 
also used for pre—post comparisons using normalized gain 
scores. 

The FMCE consists of 47 multiple choice items, each 
with between five and nine answer choices (some of which 
are purposeful distractors). The authors score the FMCE 
on a scale of 0 to 33 points, which is based on a composite 
of the first 43 questions on the instrument.’ Sets of ques- 
tions make up categories which are all parts of the con- 
struct “Newtonian Thinking.” For example, questions 
8—10 (see Figure) deal with the force on a cart moving on 
a ramp, questions 11—13 deal with the force on a coin 
tossed into the air, and questions 27—29 deal with the 
acceleration of a coin tossed into the air. In order to be 
deemed a “Newtonian thinker,” a student must answer all 
three of the questions in each of these groups correctly. 
The composite score derived from the first 43 questions 
depends upon these categorical groupings. In practical 
analyses, a composite raw score of about 40% (of the 33 
points possible) or below is indicative of non-Newtonian 
thinking (Thornton et al., 2009). 

When the FMCE was created, many physics experts 
thought the items to be too simple. They “expected that 
most [students] would answer in a Newtonian way after 
traditional physics instruction at a selective university.” 
Even after obtaining student responses which showed that 
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Questions 8-10 refer to a toy car which is given a quick push so that it rolls up an inclined ramp. 
After it is released, it rolls up, reaches its highest point and rolls back down again. Friction is so 


small it can be ignored. 


Use one of the following choices (A through G) to indicate the net force acting on the car for 
each of the cases described below. Answer choice J if you think that none is correct. 


© Net constant force up ramp 
@) Net increasing force up ramp 
© Net decreasing force up ramp 


@) Net constant force down ramp 
Net incréasing force down ramp @) Net force zero 
©net decreasing force down ramp 


8. The car is moving up the ramp after it is released. 


9. The car is at its highest point. 


10. The car is moving down the ramp. 


Figure 1. Common grouping of items dealing with force acting on a car ona 
ramp. 


very few changed their views after traditional instruction, 
“some professors suggested that perhaps the questions are 
not significant (or valid or reliable) measures of students’ 
knowledge” (Thornton & Sokoloff, 1998, pp. 338-339). In 
discussing the validity of the FMCE, Thornton and 
Sokoloff report quite a difference in student responses 
between those in JE courses and those in traditional 
courses. In addition, during development of the FMCE, 
Thornton and Sokoloff administered the test to “hundreds” 
of physics faculty, and compared student responses from 
the multiple choice version to those from an open-ended 
version which prompts for explanation. They found a very 
high correlation between these two forms of the test (Saul, 
1998). Though there are many pieces of evidence for the 
validity of the FMCE, there has been no coherent validity 
argument developed with the depth suggested by frame- 
works such as the Standards for Educational and Psy- 
chological Testing (American Educational Research 
Association, American Psychological Association, & 
National Council on Measurement in Education, 1999). 


Common Approaches to Measuring Change on 
the FMCE 

The authors of the FMCE intended student responses to 
be analyzed on the basis of raw scores using Classical Test 
Theory (CTT). In this approach, the composite raw score 
on the instrument is the statistic for representing the latent 
variable (in this case, the students’ ability to think in 
Newtonian terms). Under the assumptions of CTT, this 
observed raw score can be decomposed into a true score 
(fixed factor) and an error component (random effect). 
Comparisons between scores (individual students or class 
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averages) make no distinction in location on the score 
continuum. For example, a difference of two points near 
the bottom of the score range is considered to be the same 
interval as a difference of two points in the -middle of the 
score range. In other words, an interval scale is assumed to 
exist across the scoring range. 

The most common gains analysis applied to the scores 
from a physics concept inventory (such as the FMCE) is 
the average normalized gain <g>, which Hake (1998) 
introduced as 


<Ge ll — x) 
<G>imax  (100—-X) 


where X and Y are the class averages (expressed as a 
percentage) on a particular physics concept inventory taken 
at the beginning (1.e., “pre” (X) ) and end (i.e., “post” (Y) ) 
of an introductory course in physics. <G> and <G>max 
represent the class average (raw) gain from pre- to post- 
tests and the maximum possible class gain respectively 
(both expressed as percentages). <G>ma is the normalizing 
factor which serves to scale the gains (<G>) and attempts to 
deal with the observed ceiling effects of the instrument 
(many of the observed final scores Y are 100%). This 
normalization makes the assumption of an interval scale 
across the score continuum. However, in reality, these raw 
scores are not on an interval scale. They are ordinal at best. 

For the data used in this study, <g> was calculated to be 
.64.* This is based on the 336 pre/post-test matched raw 
scores derived from Thornton and Sokoloff’s 0—33 point 
scoring algorithm (see above section on the FMCE). This 
is a very high value for <g> compared to those reported in 
many other studies, and is nearly in the “high-g” range as 
defined by Hake (1998). It is important to note that 51 of 
the 336 matched scores (approximately 15%) had a gain of 
1, indicating that they had a perfect score on the post-test. 

Commonly, analyses using <g> focus on the entire 
population of students in the course as the unit of analysis. 
They do not often consider individual students or items as 
cases for analysis. If such analyses do provide data about 
responses to particular items or sub-concepts within the 
framework of Newtonian thinking, it is done under the 
assumptions of CTT (since the basis for these comparisons 
is based on raw scores). For example, Thornton and 
Sokoloff (1998) report the percent correct pre- and 
post- for various questions on the FMCE as “effect[s] of 
traditional instruction” (p. 339, Figure 1). Coupled with 
various correlation studies (involving subgroups by 
student demographics, education, etc.), measures of <g> 
are accepted by much of the PER community as a basis for 
making inferences about efficacy of instruction (e.g., 
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Cummings etal., 1999; Meltzer, 2002). For example, 
based on measures of <g> on FCI and FMCE administra- 
tions, Cummings et al. (1999) determined that Coopera- 
tive Group Problem Solving (an instructional innovation) 
led to gains in conceptual understanding. 

Problems in Measuring Change Using CTT 

As mentioned above, raw score measures of change have 
some limitations. Bereiter (1963) identifies three main 
“dilemmas”: (a) the “over-correction-under-correction 
dilemma,” (b) the “unreliability-invalidity dilemma,” and 
(c) the “physicalism-subjectivism dilemma.” I will discuss 
each of these in turn, as well as ways in which they can be 
dealt with. 

Observed pre-test scores and change scores share the 
same elements of measurement error (with opposite 
signs). Consider the following expressions for observed 
pre-test score (X), observed post-test score (Y), and 
observed change score (Y — X): 


A= ye. (2) 
Y=X,+G,+e, (3) 
Y= XX =G, Fe, e, (4) 


In Equations 2-4, X; represents true pre-test score, e, rep- 
resents random error on the pre-test, e, represents random 
error on the post-test, and G; represents the true change 
score. Note that algebraically, the observed pre-test (XY) 
and change scores (Y-X) share the same error in measure- 
ment (e,) with opposite signs. Because of this, there exists 
a “spurious negative element” in their correlations. In 
other words, when raw gain (change) score (YX) is 
regressed on initial score (X), the correlation will likely be 
understated due to the fact that in part, it is a regression of 
—e, on +e,. This shared component of measurement error 
calls for a correction in the regression of gains on initial 
scores. This regression itself (of gain score on initial score) 
is necessary in order to characterize the reliability of the 
change measurement. 

However, the correction for this regression is not 
straightforward. As Bereiter (1963) notes, the work of 
Garside shows us that three different methods of solving 
for this regression (all of which as “plausible”) provide us 
with three widely varying results (an increase in correla- 
tion, a decrease in correlation, and an indifference). 
Depending on which method (or whether some other 
method, such as a partial correlation) is used, the correc- 
tion to account for this error sharing element of pre-test 
and gain scores will either be overstated or understated. 
Most research reports the uncorrected correlations. 
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The most common concern with change scores has to do 
with the “unreliability-invalidity dilemma.” Related to the 
problem with regressing gain scores on initial scores, this 
dilemma presents itself as a result of these correlations. 
Researchers would usually like to see a low correlation as 
a result of this regression, which indicates a higher reli- 
ability for the gain scores. However, the problem with this 
logic is that a very low correlation between gain scores and 
initial test scores brings into question the validity of the 
instrument. If these things are not correlated, then it can be 
argued that the instrument used to obtain the observed pre- 
and post-test scores do not measure the same construct 
(i.e., construct definition has changed for the sample from 
pre- to post-administrations). If the test is therefore not 
valid, then the change scores on that test lack substantive 
meaning. This paradoxical relationship has been dealt with 
in numerous ways, and it can be shown that despite the 
above logic, gain scores can be reliable without having to 
show low correlations between gain scores and initial 
scores (Willett, 1988-89). 

For this data set, the correlation between <g> and pre- 
test scores is .026, which is lower than the correlation 
between raw gain scores and pre-test scores (.174). 
However, it is still reasonable to believe that this difference 
is within the range of measurement error in the scores and 
is therefore subject to Bereiter’s first two dilemmas. 

The most persistent dilemma in measuring change 
under the assumptions of CTT has to do with what 
Bereiter calls “physicalism-subjectivism.” This has to do 
with the scale properties of CTT measurement models, 
namely that these models assume interval scaling in which 
equal changes in units anywhere along the scale account 
for equal changes in the construct being measured. When 
this dilemma presents itself (as it always does in the mea- 
surement of change), Bereiter states that one has “the 
unpleasant option of sticking with the particular scale 
units given or some rather arbitrary transformation of 
them (physicalism), or else abandoning the given units in 
favor of others that seem to conform to some underlying 
psychological units (subjectivism)” (1963, p. 5). Although 
it is easy to pick some transformation of scale and ignore 
this dilemma, it is especially problematic when many of 
the raw scores observed are near the extremes. In the 
present sample, roughly 15% of the students hit the ceiling 
(i.e., obtained a perfect score) on the FMCE post-test. 

Taken together, “these problems seem irresolvable 
because the change measurements are based on CTT, in 
which the estimation of item and person parameters is 
mutually confounded” (Wang & Chyi-In, 2004). The 
application of IRT to the measurement of change on the 
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FMCE will focus on isolating the items from the persons 
and examining the change in their difficulties. This is 
especially appropriate for an analysis of the FMCE, since 
it can be broken down into subsets of items relating to the . 
construct of Newtonian Thinking as outlined by its design- 
ers. CTT-based approaches are not as well suited to this 
type of analysis. 

CTT modeling does not allow the simultaneous assess- 
ment of multiple aspects of examinee competence and 
does not address problems that arise whenever separate 
parts of a test need to be studied or manipulated. Formally, 
CTT does not include:components that allow interpreta- 
tion of scores based on subsets of items in the test 
(Pellegrino, Chudowsky, & Glaser, 2001, pp. 120-1). 


Methods 
Sample 

The current study was conducted at a large public 
research university in the mountain west, where the FMCE 
is routinely administered to students in introductory, 
calculus-based physics courses pre- and post-instruction. 
The course from which the sample was drawn is the first in 
a three-course sequence for science and engineering stu- 
dents, is calculus based, and covers a mechanics curricu- 
lum. The data for this study come from the spring semester 
of 2004, which was taught using IE methods. Specifically, 
the course instructor utilized clickers and the Peer Instruc- 
tion model (Mazur, 1997), and made use of Learning 
Assistants (LAs; Otero, Finkelstein, McCray, & Pollock, 
2006). The LAs worked primarily in the associated 
recitation/lab sections (~25 students in each) which 
used the Washington Tutorials in Introductory Physics 
(McDermott & Shaffer, 2002). The course also used an 
online interactive homework system (CAPA: Computer- 
Assisted Physics Assignments) and a help room for 
physics students which was staffed daily from 9:00-5:00. 
It should also be noted that the course instructor was very 
experienced in teaching using IE methods. 

The total number of FMCE pre-test respondents was 
468, and total number of post-test respondents was 410. 
Matched pre- and post-test data exist for 336 students. 
This is important to note because any gains analysis under 
CTT can use only these 336 matched student responses. 
The IRT-based approach to examining change through 
item difficulty analysis is able to use all respondent data 
(468 pre- and 410 post-, representing 531 distinct respon- 
dents in total). This is because the analyses focus on the 
“items” themselves, rather than the respondents, and in 
IRT the estimation of item and person parameters is not 
mutually confounded, as in CTT. 
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Approximately, 75% of the respondents were male, and 
25% were female. Other background variables such as 
race and socioeconomic status (SES) were not available. 
Analysis | 

The initial analysis treated all 47 items on the FMCE as 
independent and dichotomously scored. I used a Rasch 
model (Bond & Fox, 2007; Rasch, 1980) to estimate the 
item difficulty parameters for each of the 47 items on the 
pre-test (n = 468). 


pisee=ie<p joe OC. = Py).. 
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This one-parameter logistic model (Equation 5) estimates 
the probability of correct response to an item i by a person 
s. In the model, X;, represents the response of person s to 
item 7, ©, represents the ability estimate (i.e., trait level) of 
person s, and fis the difficulty of item i. For the purposes 
of this analysis, the person ability estimate (©,) was con- 
strained to have a mean of zero while item difficulty (f;) 
was free to be estimated by the modeling software. 

I used the IRT modeling software ConQuest (Wu, 
Adams, & Wilson, 1997) to estimate the item difficulty 
parameters from the pre-test data. Once the pre-test item 
difficulties were obtained, a second data set was created 
that included both pre-test and post-test items (94 items 
total) for all respondents (n = 531). In this fashion, I could 
command the software to freely estimate the post-test item 
difficulties while anchoring the pre-test item difficulties 
which were previously modeled. Direct comparisons could 
then be made between the values obtained for pre-test and 
post-test item difficulties for each item. 

I also conducted a secondary analysis in which I divided 
the FMCE into 11 “testlets” (Wainer & Kiely, 1987) based 
on content groupings similar to those discussed above. 
This was done in order to deal with the violation of the 
local independence assumption of IRT. Due to the fact that 
groupings of items shared common answer pools, it is 
reasonable to assume that items within these groups were 
locally dependent on one another. In this analysis, I used a 
Partial Credit Model (PCM; Masters, 1982) to analyze the 
resulting polytomous testlet item data. This model is 
part of the Rasch family of IRT models, and is given by 
Equation 6. 
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This equation gives the probability of a person with ability 
© responding to item i with response x (where item 7 is 
scored from x = 0 to m; [the maximum number possible for 
item i] ). The category response is denoted by /, and the 
parameter 6,; represents step difficulty—the location on 
the latent ability continuum where a respondent has a 50% 
probability of a response in category x relative to category 
x-1. 

Again using the ConQuest software (Wu et al., 1997), I 
ran a PCM on the pre-test response data (n = 468) in order 
to obtain item parameter estimates. These estimates were 
used to anchor the subsequent run, which included both 
pre- and post-test responses (n = 53/). Again, in the first 
model run, person ability estimates were constrained to 
have a mean of zero so that item parameters were free to be 
estimated by the model. The second (combined) run had 
double the number of testlets (22 total). In this run, pre-test 
item (testlet) difficulty parameters were anchored to those 
obtained in the first run, and post-test item (testlet) diffi- 
culty parameters were estimated relative to these pre-test 
values. The resulting item parameter estimates from both 
runs (pre-test items from the first run and post-test items 
from the second run) serve as the basis for this secondary 
analysis. 

The description of the IRT models used makes explicit 
two of the greatest limitations of IRT: (a) estimation pro- 
cedures for both person ability and item difficulty are 
complex and not straightforward, and (b) many practitio- 
ners and researchers lack the knowledge and experience to 
carry out such procedures and interpret the results. CTT- 
based calculations and score interpretations are quite 
intuitive and well accepted by many educators and 
researchers. IRT can appear to be a sort of “black box” 
which does not lend itself to widespread adoption and use. 


Results 
Dichotomous Rasch Analysis 
For both analyses, I express changes in item difficulty 
from pre- to post-test in terms of effect sizes. Equation 7 
gives the effect size (E) calculation based on CTT item 
difficulty, and Equation 8 gives that for IRT item difficulty 
estimates. 


(p t — Por ) 
E ne pos pre 
has Sal wera 
Er = (Brose ee) (8) 
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In Equation 7, p,.s: represents item difficulty for the item in 
the post-test, Pp. represents the item difficulty for the item 
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Figure 2. CTT and IRT change in item difficulty effect sizes. 


in the pre-test, and SD,_pre_ail_items 18 the standard deviation 
for all 47 pre-test item difficulties (p values). In Equation 
8, Boos tepresents IRT item difficulty parameter for the 
item in the post-test, B,,. represents the IRT item difficulty 
parameter for the item in the pre-test, and SDg pre ait items 1S 
the standard deviation for all 47 pre-test item difficulties 
(B values). The absolute value is taken for the difference 
between post- and pre-item difficulties in the IRT calcula- 
tion due to the fact that the scale is inverted relative to the 
p values in CTT. In IRT for example, B values become 
smaller (even negative) as the item gets easier. Plots of 
CTT and IRT change in item difficulty effect sizes for the 
first (47 item) analysis are shown in Figure 2. In compar- 
ing the two plots visually, one notices a general compres- 
sion in effect size for change in IRT difficulties relative to 
that for CTT difficulties. 

Note that a direct quantitative comparison between the 
CTT and IRT item difficulty effect sizes is not possible due 
to differences in variance and scale. For example, the CTT 
item difficulties for the pre-test items have an SD = .24, 
while the IRT item difficulties for the pre-test have an SD = 
1.71. Because the CTT difficulties are on an ordinal scale 
and the IRT difficulties are on an interval scale, normalizing 
both sets of difficulties for direct comparison is not a 
statistically sound strategy either. Because of these issues, I 
will discuss the item difficulties from the two measurement 
models as they pertain to each item without quantitatively 
comparing the two different effect size measures. 

Discussion of dichotomous Rasch analysis. In each of 
the plots of change in item difficulty effect size, there are 
groups of items that clearly have lower effect sizes than the 
others. Although the order of the lowest effect size items is 
slightly different in the IRT and CTT models, the items in 
this grouping are the same (see Table 1) for each measure- 
ment model. 
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Table 1 
Lowest Change in Item Difficulty Effect Sizes for Both CTT and IRT Models 
RE POS Sa ae ee eee eee 


CTT Lowest Item IRT Lowest Item 
Difficulty Effect Sizes Difficulty Effect Sizes 


Item Effect Size Item Effect Size 
37 1.01 al 90 
41 .62 42 2 
42 ol 41 .67 
43 wm) 43 59 
40 10 15 49 
15 <3 Ont nn 40 26 
33 ar Oi 33) aA 


CTT = Classical Test Theory; IRT = Item Response Theory. 


Item 37 is one of a group of questions about a car 
pushing a truck and deals with Newton’s Third Law. 
Fifty-eight percent of respondents answered this question 
correctly on the pre-test, and 83% answered it correctly 
on post-test. An interesting question to ask is why the 
change in item difficulty for this question is so much 
lower than that for the related questions, 35, 36, and 38? 
Items 36 and 38 were extremely difficult for pre-test 
respondents (8% and 7% correct, respectively) and deal 
with Newton’s Third Law and the concept of acceleration 
in the same situation. Part of the low effect size for item 
37 can be accounted for by the fact that it was the easiest 
of these four items to begin with, and therefore did not 
have as much room to change. I will further examine 
this grouping in the polytomous testlet item analysis, as 
these items together make up one the testlet groupings 
(testlet 8). 

Items 40 through 43 ask the respondent to choose appro- 
priate velocity—time graphs to describe the motion of a car 
in different situations. These items were fairly easy for 
pre-test respondents (71-90% correct) and very easy for 
post-test respondents (86-95%). Again, I will further 
examine these items as a group (these items comprise 
testlet 10) in the next section. 

Item 15 asks the respondent to choose the force-time 
graph which represents a car at rest. This item was very 
easy both on the pre-test and post-test, with 94 and 97% 
(respectively) answering it correctly. Again, because it was 
relatively easy on the pre-test, there is not much room for 
growth or change. 

Item 33 was very easy for respondents on both pre- and 
post-tests. It deals with a collision between vehicles of 
equal mass and asks respondents about the forces acting 
on the vehicles during the collision. Similar to the above 
interpretation, because this item has such a low difficulty, 
there is no room for change. 
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Table 2 
Highest Change in Item Difficulty Effect Sizes for Both CTT and IRT Models 
nan ae ee Sg eee cert 


CTT Highest Item IRT Highest Item 
Difficulty Effect Sizes Difficulty Effect Sizes 


Item Effect Size Item Effect Size 
30 2.83 30 DEI 
34 213 34 2.44 
32 2.67 3 2.40 
28 2.61 28 2.39 
12. 2.59 12 2.28 
1 2.54 31 223 
31 2.48 11 2.21 


CTT = Classical Test Theory; IRT = Item Response Theory. 


At the other end of the effect size range, there are similar 
groupings of items that have the highest change in item 
difficulty effect sizes under both the CTT and IRT models. 
These items and their effect sizes are shown in Table 2. 

Items 30, 31, 32, and 34 are all part of the same group- 
ing on the FMCE and deal with collisions and Newton’s 
Third Law paired forces. These items were all quite diffi- 
cult for respondents on the pre-test (between 18 and 28% 
answered them correctly) and were somewhat easy for 
respondents on the post-test (between 83 and 87% 
answered them correctly). These changes indicate a large 
growth in student understanding regarding this particular 
topic. Item 33 (which is also part of this grouping) was 
discussed above and was one of the easiest items overall 
and therefore did not have much room for change. 
Together, items 30 through 34 make up testlet 7 and will be 
discussed in the next section. 

Item 28 is one of a group of three items (in testlet 6) 
which asks students about the acceleration of a coin tossed 
straight up into the air. This particular item asks about the 
acceleration at the top of the trajectory. Nineteen percent 
of students answered this correctly on the pre-test, and 
82% answered it correctly on the post-test. The idea that 
the coin has an acceleration equal to —9.8 m/s? (the accel- 
eration due to gravity, g) at the top of its trajectory is a 
difficult concept for students to understand. 

Items 11 and 12 also refer to a coin tossed into the air, 
but instead of asking students about the acceleration of the 
coin, these items ask students about the force acting on the 
coin. For both items, 19% of students responded correctly 
on the pre-test, and 80-82% responded correctly on the 
post-test. It is reasonable to think that on the pre-test, the 
alternative conception was that students assumed that 
since the coin was either moving upward (question 11) or 
motionless at the top (question 12), then there could not be 
a downward force on the coin (the force due to gravity). 
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Figure 3. Testlet change in item difficulty effect size. 


Because questions 12 and 28 (discussed above) are so 
closely related, it is not surprising to see low pre-test 
scores on item 12 after having examined item 28. Data 
from items 11 and 27 indicate discordant thinking on the 
part of the respondents regarding force and acceleration on 
the upward-moving coin. 

In the next section, I will examine many of these items 
grouped together into testlets. Because I do not have set 
criteria for success on each testlet that can be expressed as 
p-values (under CTT assumptions), the discussion will 
deal only with testlet item difficulties as obtained by the 
polytomous PCM analysis. 

Polytomous PCM Analysis 

As would be expected, the average item (testlet) diffi- 
culties decreased from pre-test to post-test (see Figure 3). 
What is worth looking at in detail is the relative change in 
difficulty between testlets and the content of each testlet. 
For example, the largest magnitude decrease in testlet dif- 
ficulty is seen in testlets 3 and 6. Each of these testlets 
decreased in difficulty by an effect size of about 2.3 effect 
size units. These two testlets each had to do with the same 
physical phenomenon (the coin tossed into the air). On the 
other hand, the lowest magnitude decrease in testlet diffi- 
culty is seen in testlet 9 (Newton’s Third Law) and testlet 
10 (one-dimensional motion and velocity-time graphs). 
Testlet 10 was the easiest item to begin with (it is com- 
posed of items 40—43, which are discussed above), so its 
difficulty could not change much from pre-test to post-test. 
Testlet 9, however, was of average difficulty initially. 

Discussion of polytomous PCM analysis. Given the 
above information about changes in item difficulties from 
pre-test to post-test, I am now in a position to make some 
initial inferences about student learning in the specific 
areas of Newtonian thinking. The largest gains made by 


363 


a 
Item-Level Approach to Measuring Change with FMCE 


this class were on the items dealing with the force and 
acceleration of a coin tossed into the air (testlets 3 and 6). 
Although the items in these groups were some of the more 
difficult ones on the pre-test, they were not the most dif- 
ficult. Testlets 2, 1, and 4 were all more difficult for the 
students. Therefore, it is reasonable to think that the large 
changes in item difficulty effect size for testlets 3 and 6 are 
not merely artifacts of their initial difficulty. It would be 
interesting to investigate how the topic of free fall was 
represented in the course curriculum and instruction rela- 
tive to other topics, such as Newton’s Second Law (testlet 
1, which showed lower gains). 

The areas that showed the lowest change in item diffi- 
culty effect size were testlets 5 and 9. Testlet 9 consisted of 
only one item which deals with Newton’s Third Law. 
Thirty-one percent answered this item correctly on the 
pre-test, and 84% answered it correctly on the post-test, 
making it an item of mid-range difficulty initially. Testlet 5 
presents the student with acceleration vs. time graphs 
related to the motion of a car on a ramp. These items were 
also of mid-range difficulty initially (32 to 51% answered 
correctly on pre-test) and of moderate difficulty on the 
post-test (73 to 80% answering correctly). It is somewhat 
surprising that these items (which represent concepts basic 
to Newtonian thinking, namely kinematics in one dimen- 
sion) were not easier on the post-test. 


Conclusion 

In answering the first research question (At the item 
level, how do IRT approaches to the measurement of 
change on the FMCE compare to CTT measures of 
change?) I find that the changes in item difficulty as deter- 
mined by the two measurement models are not all that 
different. In the first analysis, the change in item difficulty 
effect sizes as found under the CTT and IRT models lead 
one to examine the same sets of questions. Groupings of 
items that had the highest and lowest change in item dif- 
ficulty effect sizes were the same regardless of the mea- 
surement model used. Although I did not have a basis for 
quantitatively comparing the effect size measures from 
both models, a qualitative comparison shows that they are 
quite similar, but that the range of IRT effect sizes was 
compressed relative to those for CTT. Based on these 
findings, I would recommend that researchers should con- 
tinue to use CTT measurement models but should consider 
examining changes in item performance as well as student 
performance. That said, there is a stronger statistical basis 
for making claims based on such analyses under the IRT 
measurement model. The obvious trade-off is ease of inter- 
pretability for audiences not familiar with IRT. As stated 
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above, this is a major limitation of using IRT. Lack of 
familiarity with probabilistic modeling of abilities and 
item difficulties makes modeling, interpretation, and com- 
munication of results difficult. _ 

In addressing the second research question (Do the IRT 
approaches to measuring change on the FMCE lead us to 
make different inferences about student learning of New- 
tonian Physics?), the answer is less clear. Using the 
pei tae 47-item Rasch analysis, the answer would be 

0.” However, using the testlet-based approach and a 
sities Partial Credit Model, I might have a different 
answer. In order to deal with the violation of the assump- 
tion of local independence, items with common answer 
pools were grouped into testlets. The secondary analysis of 
the change in item difficulty effect sizes for these testlets 
provided information that was not available under the CTT 
measurement model. Specifically, these common item 
groupings that dealt with similar content could be more 
easily compared to one another so that different inferences 
could be made about these groups. The take-home 
message from this analysis is to consider analyzing con- 
ceptually coherent item groupings in addition to aggregate 
test scores. But a more nuanced implication is that one 
must have a theory for defining such item groupings, 
which is a tenet of construct-based measurement and IRT 
modeling. In the case of these analyses, that theory was 
both statistical (based on local item dependence) and 
content oriented. Such groupings need to be theoretically 
defined in order to support the analyses used and infer- 
ences drawn. A set of items that look similar may not 
constitute a theoretically based grouping upon which 
inferences can be made. Further, from a validity stand- 
point one must understand the very real limitations of 
choosing a subset of items from a test. In doing so, the 
construct has changed, and previous validity evidence may 
no longer support such a use. 

From a statistical standpoint, the next steps in this line of 
research should focus attention in two areas: (a) developing 
a non-parametric method for comparing change in item 
difficulties under the two measurement models, and (b) 
using this method to make comparisons of gains (from the 
perspective of change in item performance) between differ- 
ent semesters of the same course. Once researchers have a 
sound statistical basis for making these between-semester 
comparisons, we can better compare gains from the per- 
spective of changing item performance to those character- 
ized by the normalized gain <g>. From the standpoint of 
science educator and science education researcher, future 
research should examine the degree to which the analysis of 
conceptually coherent (e.g., theoretically defined) item 
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groupings can provide insight into change in students’ 
conceptual understanding, while acknowledging the poten- 
tial threats to validity that such an approach might intro- 
duce. The current use of aggregate scores from concept 
inventories may be too blunt an instrument in some cases, 


especially when we also have at our disposal a much sharper 
instrument. 


References 

American Educational Research Association, American Psychological Asso- 
ciation, & National Council on Measurement in Education (1999). Stan- 
dards for educational and psychological testing. Washington, DC: 
American Educational Research Association. 

Bereiter, C. (1963). Some persisting dilemmas in the measurement of change. 
In C. W. Harris (Ed.), Problems in measuring change (pp. 3-20). Madison: 
University of Wisconsin Press. 

Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental 
measurement in the human sciences. Mahwah, NJ: L. Erlbaum. 

Bonham, S. W., Deardorff, D. L., & Beichner, R. J. (2003). Comparison of 
student performance using web and paper-based homework in college- 
level physics. Journal of Research in Science Teaching, 40(10), 1050— 
1071. 

Cronbach, L. J., & Furby, L. (1970). How should we measure “change’”—Or 
should we? Psychological Bulletin, 74(1), 68-80. 

Cummings, K., Marx, J., Thornton, R. K., & Kuhl, D. (1999). Evaluating 
innovation in studio physics. American Journal of Physics, 67(S1), S38— 
S44. 

Finkelstein, N. D., & Pollock, S. J. (2005). Replicating and understanding 
successful innovations: Implementing tutorials in introductory physics. 
Physical Review Special Topics—Physics Education Research, 1, 010101- 
1-010101-13. 

Hake, R. (1998). Interactive-engagement vs traditional methods: A six- 
thousand-student survey of mechanics test data for introductory physics 
courses. American Journal of Physics, 66(1), 64-74. 

Hake, R. (2002). Lessons from the physics education reform effort. Conser- 
vation Ecology, 5(2), art. 28. 

Hestenes, D., Wells, M., & Swackhammer, G. (1992). Force concept inven- 
tory. Physics Teacher, 30(3), 141-158. 

Masters, G. N. (1982). A Rasch model for partial credit scoring. 
Psychometrika, 47(2), 149-174. 

Mazur, E. (1997). Peer instruction: A user’s manual. Upper Saddle River, NJ: 
Prentice Hall. 

McDermott, L. C., & Shaffer, P. S. (2002). Tutorials in introductory physics. 
Upper Saddle River, NJ: Prentice Hall. 
Meltzer, D. E. (2002). The relationship between mathematics preparation and 
conceptual learning gains in physics: A possible “hidden variable” in 
diagnostic pretest scores. American Journal of Physics, 70(12), 1259-1268. 

Otero, V., Finkelstein, N., McCray, R., & Pollock, S. (2006). Who is respon- 
sible for preparing science teachers? Science, 313(5786), 445-446. 

Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students 
know: The science and design of educational assessment. Washington, DC: 
National Academy Press. 

Pollock, S. J. (2004). No Single Cause: Learning Gains, Student Attitudes, and 
the Impacts of Multiple Effective Reforms. Paper presented at the Physics 
Education Research Conference, Sacramento. 

Rasch, G. (1980). Probabilistic models for some intelligence and attainment 
tests (Expanded ed.). Chicago: University of Chicago Press. 

Ruiz-Primo, M., Briggs, D. C., Iverson, H. I., Talbot, R. M., & Shepard, L. 
(2011). Impact of undergraduate science course innovations on learning. 
Science, 331(6022), 1269-1270. 


School Science and Mathematics 


Saul, J. M. (1998). Beyond Problem Solving: Evaluating Introductory Physics 
Courses Through the Hidden Curriculum. PhD Dissertation. University of 
Maryland College Park. 

Smith, T. I., & Wittmann, M. C. (2007). Comparing three methods for teach- 
ing Newton’s third law. Physical Review Special Topics—Physics Educa- 
tion Research, 3(2), 020101-1—020101-11. 

Thornton, R. K., Kuhl, D., Cummings, K., & Marx, J. (2009). Comparing the 
force and motion conceptual evaluation and the force concept inventory. 
Physical Review Special Topics—Physics Education Research, 5(1), 
010105-1—010105-8. 

Thornton, R. K., & Sokoloff, D. R. (1998). Assessing student learning of 
Newton’s laws: The force and motion conceptual evaluation. American 
Journal of Physics, 66(4), 228-351. 

Van Domelen, D. J., & Van Heuvelen, A. (2002). The effects of a concept- 
construction lab course on FCI performance. American Journal of Physics, 
70(7), 779-780. 

Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive 
testing: A case for testlets. Journal of Educational Measurement, 24(3), 
185-201. 

Wang, W., & Chyi-In, W. (2004). Gain score in item response theory as an 
effect size measure. Educational and Psychological Measurement, 64(5), 
758-780. 

Willett, J. B. (1988-89). Questions and answers in the measurement of 
change. Review of Research in Education, 15, 345-422. 

Wilson, M. (2005). Constructing measures: An item response modeling 
approach. Mahwah, NJ: Lawrence Erlbaum Associates. 

Wu, M: L., Adams, R. J., & Wilson, M. (1997). ConQuest Generalised Item 
Response Modeling Software: Australian Council for Educational 
Research. 


Author’s Notes 

' http://www.compadre.org/psrc/items/ 
detail.cfm?ID=924 and http://listserv.buffalo.edu/cgi-bin/ 
wa?A0=physlrnr-list 

* Interactive Engagement (IE) is “designed at least in 
part to promote conceptual understanding through inter- 
active engagement of students in heads-on (always) and 
hands-on (usually) activities which yield immediate feed- 
back through discussion with peers and/or instructors” 
(Hake, 1998). 

> Questions 44-47 deal with mechanical energy and are 
usually not included in the analyses. It is not always clear 
if the same scoring strategy is followed in different analy- 
ses using the FMCE. For example, Cummings et al. (1999) 
do not include questions 44—47 and explicitly cite Thorn- 
ton in their discussion of scoring, who also omits question 
6 from some analyses. In short, there appears to be some 
variability in the way researchers score the FMCE 
responses. 

* This value of <g> is based on using the class average 
pre- and post-test scores, as described above in the expla- 
nation of the equation for <g>. Another approach is to 
average the individual student gains and use this as a 
measure of the class <g>. Using this method yields a <g> 
of .66 for the same data. 
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Ted Eisenberg, Section Editor 





This section of the Journal offers readers an opportunity to exchange interesting mathematical problems and solutions. Please send them to Ted 
Eisenberg, Department of Mathematics, Ben-Gurion University, Beer-Sheva, Israel or fax to: 972-86-477-648. Questions concerning proposals 
and/or solutions can be sent via e-mail to eisenbt@013.net. Solutions to previously stated problems can be seen at http://www.ssma.org/ 


publications. 
a 





Solutions to the problems stated in this issue should be posted before January 15, 2014 


5271: Proposed by Kenneth Korbin, New York, NY | Ly ee ee 
Given convex cyclic quadrilateral ABCD with AB=x, BC=y, and BD=2AD=2CD. 
Express the radius of the circum-circle in terms of x and y. 


5272: Proposed by Tom Moore, Bridgewater State University, Bridgewater, MA 

; 2” —(-1)” : Sane. 
The Jacobsthal numbers begin 0, 1, 1,3, 5, 11,21, ... with general term J, = wee Vn= 0. Prove that there are infinitely many Pythagorean 
triples like (3, 4, 5) and (13, 84, 85) that have an hypotenuse that is a Jacobsthal number. 


5273: Proposed by Titu Zvonaru, Comanesti, Romania, and Neculai Stanciu, “George Emil Palade” General School, Buzau, Romania 
Solve in the positive integers the equation abcd + abc = (a+ 1)(b+ 1)(c + 1). 


5274: Proposed by Enkel Hysnelaj, University of Technology, Sydney, Australia 
Let x, y, z, & be real positive numbers. Show that if 


3 
ya ae 


2 
ol x +1 
then 


TP oro ona 
> ERE 


where n is a natural number. 


5275: Proposed by José Luis Diaz-Barrero, Barcelona Tech, Barcelona, Spain 
Find all real solutions to the following system of equations: 


V2+V24+...4-V2+x, +V2-V24...4 24x, =eV2, 
*. V2+V2+...4V2+x, +V2- 2+...tV¥2+x,, =3%,V2, 


V24+V2+4...4¢V24+x,, +V2-V2+...4V24+x,) =x,V2, 
V2 DEE TE ND N25 a SND, 














where n = 2. 


5276: Proposed by Ovidiu Furdui, Technical University of Cluj-Napoca, Cluj-Napoca, Romania 
(a) Let a € (0, 1] be a real number. Calculate 


where Lx] denotes the integer part of x. 
(b) Calculate 
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